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Preface 



This volume contains the proceedings of the Eleventh International Conference 
on Computer-Aided Verification (CAV’99), held at Trento, Italy on July 6-10. 
The CAV conferences are dedicated to the advancement of the theory and prac- 
tice of computer- assisted formal analysis methods for software and hardware sy- 
stems. The conference covers the spectrum from theoretical results to concrete 
applications and tools and has traditionally drawn contributions from both rese- 
archers and practitioners in both academia and industry. This year we received 
107 submissions, out of which we accepted 34. We also accepted five short tool 
presentations. 

CAV included a tutorial day this year, with four invited tutorials by Ed- 
mund M. Clarke (CMU) on Symbolic Model Checking^ David Dill (Stanford) on 
Alternative Approaches to Hardware Verification^ Joseph Sifakis (VERIMAG) on 
The Compositional Specification of Timed Systems^ and Rajeev Alur (UPenn) 
on Timed Automata. The conference also included four invited talks by Gunnar 
Stalmarck (Prover Technology) on StdlrnarckTs Method and QBF Solving^ Zohar 
Manna (Stanford University) on Visual Verification of Parameterized Programs 
(joint with CADE)^ Ed Brinksma (University of Twente) on Formal Methods for 
Conformance Testing: Theory Can Be Practical! and Alain Deutsch (INRIA) on 
Abstract Interpretation Applied to the Verification of Ariane 5 Software. 

The conference program included papers on a wide variety of topics, inclu- 
ding microprocessor verification, verification and testing of protocols, methods 
for verification of systems with infinite state spaces, the theory of verification, 
verification of temporal logic properties, modeling of systems, symbolic model 
checking, theorem proving, combining theorem proving with model-checking, 
model-checking techniques based on automata theory and abstraction methods. 

Many industrial companies have shown wide interest in CAV, ranging from 
using the presented technologies to developing and marketing their own tech- 
niques and tools. We would like to thank the following generous and forward- 
looking companies for their sponsorship of CAV’99: 

- Intel 

- The John von Neumann Minerva Center for Verification of Reactive 
Systems at the Weizmann Institute 

- Lucent Technologies 

- Siemens 

- ST Microelectronics 

- SUN Microsystems 

The Steering Committee consists of the conference founders: 

Edmund M. Clarke (Carnegie Mellon University), Robert P. Kurshan 

(Bell Laboratories), Amir Pnueli (The Weizmann Institute of Techno- 
logy) and Joseph Sifakis (VERIMAG) 



The conference program was selected by the program committee, which included 
this year: 




VI 
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Gerard Berry (Ecole des Mines, France), Ahmed Bouajjani (VERIMAG, 
France), Ching-Tsun Chou (Intel, USA), Edmund M. Clarke (CMU, 
USA), Werner Damm (Oldenburg U., Germany), David Dill (Stanford 
U., USA), Allen Emerson (Austin U., USA), Javier Esparza (Munich 
U., Germany), Limor Fix (Intel, Israel), Mike Gordon (Cambridge U., 
UK), Nicolas Halbwachs (co-chair, VERIMAG, France), Tom Henzinger 
(Berkeley, USA), Alan Hu (UBC, Canada), Bengt Jonsson (Uppsala U., 
Sweden), Robert P. Kurshan (Bell Labs, USA), Gavin Lowe (Leicester 
U., UK), Ken McMillan (Cadence, USA), Doron Peled (co-chair. Bell 
Labs, USA and Technion, Israel), Carl Pixley (Motorola, USA), A. Pra- 
sad Sistla (Chicago U., USA), Fabio Somenzi (Colorado U., USA), Man- 
dayam Srivas (SRI, USA), Antti Valmari (Tampere U. Techn., Finland), 
Yaron Wolfsthal (IBM, Israel) and Pierre Wolper (Liege U., Belgium). 

We would also like to thank the following additional reviewers: Mark Aa- 
gaard, Luca de Alfaro, Pascalin Amagbegnon, Simon Ambler, Nina Amla, 
Flemming Andersen, Eugene Asarin, Landver Avner, Neta Ayzenbood-Reshef, 
Clark Barrett, Jason Baumgartner, Ilan Beer, Shoham Ben- David, Ser- 
gey Berezin, Armin Biere, Roderick Bloem, Juergen Bohn, Bernard Boigelot, 
Amar Bouali, Marius Bozga, Steve Brookes, Randy Bryant, Gianpiero Ca- 
bodi, Paolo Camurati, Ilaria Castellani, Gerard Cece, Pankajkuman Chau- 
hanq. Eve Coste-Maniere, Bruno Courcelle, Sadie Creese, Roy Crole, Eh Dich- 
terman, Jurgend Dingel, Cindy Eisner, Jean-Claude Fernandez, Ran an Fraer, 
Martin Franzle, Richard Gault, Daniel Geist, Jaco Geldenhuys, Rob Gerth, 
Boris Ginsburg, Michael Goldsmith, G. Gonthier, Shankar Govindaraju, Su- 
sanne Graf, Orna Grumberg, Dilian Gurov, Gary Hachtel, John Harrison, 
John Havlicek, Nevin Heintze, Keijo Heljanko, Juhana Helovuo, Tamir Hey- 
man, Hiroyuki Higuchi, Pei-Hsin Ho, Gerard Holzmann, Mei Lin Hui, Mar- 
ten van Hu 1st, Hardi Hungar, Warren Hunt, Amitai Iron, Jae- Young Jang, 
Somesh Jha, Robert B. Jones, Bernhard Josko, Tommi Junttila, Gila Kamhi, 
Konst a Karsisto, Shmuel Katz, Sharon Keidar, Pertti Kellomaki, Astrid Kiehn, 
Barbara Koenig, Ilkka Kokkarinen, Antonin Kucera, Orna Kupferman, Yas- 
sine Lakhnech, Avner Landver, Ranko Lazic, David Lee, Leonid Libkin, Jo- 
han Lilius, Zhiming Liu, Yuan Lu, Enrico Mach, Sela Mador-Haim, Will Mar- 
rero, Richard Mayr, Jon Millen, Marius Minea, In-Ho Moon, John Moondanos, 
Laurent Mounier, Anca Muscholl, Kedar Namjoshi, Peter Niebert, Juergen Nie- 
haus, A let t a Nylen, Sven-Olof Nystrom, John O’Leary, At an as Parashkevov, 
Abelardo Pardo, David Park, D. Stott Parker, Wojciech Penczek, Antoine Pe- 
tit, Paul Pettersson, Avi Puder, Shaz Qadeer, Stefano Quer, Sriram Rajamani, 
J-F Raskin, Kavita Ravi, Joy Reed, Fran Rippel, Steven Roberts, Yoav Ro- 
deh, Christine Roeckl, Stefan Roemer, Marco Rover i, Valerie Roy, Harald Ruess, 
Vlad Rusu, Peter Ryan, Hassan Saidi, Jun Sawada, Rainer Schloer, Steve Schnei- 
der, Claus Schroeter, Stefan Schwoon, Ellen Sentovich, Tom Shiple, Robert de Si- 
mone, Jens Skakkebaek, Dawn Song, Jeffrey Su, Don Syme, Maciej Szreter, 
Serdar Tasiran, Javier Thayer, Horia Toma, Richard Trefler, Stavros Tripakis, 
Irek Ulidowski, Moshe Vardi, Kimmo Varpaaniemi, Bjorn Victor, Copal Vijayan, 
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Heikki Virtanen, Frank Wallner, Howard Wong-Toi, Bwolen Yang, Tali Yatzkar- 
Haliam, Wang Yi, Irfan Zakiuddin, Avi Ziv and Baruch Ziv. 

This year, CAV was part of the Federated Logic Conference (FLoC’99), 
and was organized jointly with CADE (Conference on Automated Deduction), 
TICS (Logic in Computer Science) and RTA (Rewriting Techniques and Appli- 
cations). In addition, FLoC included 15 workshops associated with the different 
conferences, panels and tutorials. We would like to acknowledge the help of 
the FLOC’99 steering committee: Moshe Y. Vardi (General Chair), Fausto Gi- 
unchiglia (Conference Chair), Leonid Libkin (Publicity Chair), Paolo Traverso 
(CADE), Joseph Sifakis (CAV), Eugenio Moggi (LICS), Simona Ronchi della 
Rocca (LICS), Andrea Asperti (RTA), Morena Carli, Nadia Oss Papot (Secreta- 
riat), Alessandro Tuccio (Treasurer) and Adolfo Villafiorita (National Publicity 
Chair and Workshops Coordinator). 

Finally, we would like to extend special thanks to Richard Gerber for kindly 
lending us his “START” conference management software — which was of 
incredible help — , and to Yannick Raoul for installing and adapting the software. 



April 1999 



Nicolas Halbwachs 
Doron Peled 
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Abstract 

BDD-based symbolic model checking has received a great deal of attention be- 
cause of its potential for solving hardware verification problems. However, there 
are other, qualitatively different, approaches that are also quite promising (which 
having different strengths and weaknesses) . This tutorial surveys a variety of ap- 
proaches based on symbolic simulation. 

Symbolic simulation allows the user to set inputs to variables instead of 
constants, and propagates expressions containing those variables through the 
operators and expressions of the circuit. Symbolic simulation is attractive, be- 
cause it works for large designs and can be made to degrade gracefully when 
designs become too large. It has the disadvantage that it is difficult or impos- 
sible to compute invariants automatically. By comparison, the main strength 
of model checking is its ability to computer invariants via iterative fixed point 
computations. 

The tutorial discusses different approaches to symbolic simulation and ap- 
plications that make effective use of it, including abstraction methods and self- 
comparison. 
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of Timed Systems — A Tutorial 
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Motivation 

The analysis of reactive systems requires models representing the system, its 
interaction with the environment and taking into account features of the under- 
lying execution structure. It is important that such models are timed if analysis 
concerns performance, action scheduling or in general, dynamic aspects of the 
behavior. In practice, timed models of systems are obtained by adding timing 
constraints to untimed descriptions. For instance, given the functional descrip- 
tion of a circuit, the corresponding timed model can be obtained by adding 
timing constraints about propagation delays of the components; to build a ti- 
med model of a real-time software, quantitative timing information concerning 
execution times of the statements and significant changes of the environment 
must be added. 

The construction of timed models of reactive systems raises some important 
questions concerning their composition and in particular, the way some well- 
understood constructs for untimed systems can be extended to timed systems. 

In this tutorial, we present an overview of existing executable timed forma- 
lisms with a global notion of time, by putting emphasis on problems of com- 
positional description. The results on compositionality have been developed in 
collaboration with S. Bornot, at Verimag. 



Timed Formalisms 

Timed formalisms are extensions of untimed ones by adding clocks^ real- valued 
variables that can be tested and modified at transitions. Clocks measure the 
time elapsed at states. Timed automata [AD94,ACH+95], timed process algebras 
[NS91] and timed Petri nets can be considered as timed formalisms. 

The semantics of timed formalisms can be defined by means of transition 
systems that can perform time steps or (timeless) transitions. A state is a pair 
(s,i;), consisting of a control state s (of the untimed system) and a valuation of 
the clocks. As a rule, transitions are specified by a guard (predicate) on clocks 
and an assignment of new values to clocks. They correspond to actions of the 
considered system. Time progress conditions are predicates on clocks associated 
with control states s that specify how time can progress: a time step of duration 
d can be performed from s only if all the intermediate states satisfy the time 
progress condition. 
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An important feature of timed models is the possibility to express urgency 
of an action (transition). An action enabled at a state becomes urgent if 

time cannot progress at v. As time cannot advance, the urgent action can be 
executed. Expressing urgency is essential in modeling the real-time behavior of 
systems. However, stopping time progress to simulate urgency, can be a source of 
problems, especially when composing timed models. The independent description 
of transitions and of time progress conditions may induce undesirable deadlock 
situations where time cannot progress and no action is enabled. 

To avoid timelocks, a class of timed formalisms has been studied where time 
progress conditions are associated with the transitions in the form of deadlines 
[SY96,BS98,BST97]. The deadline of a transition is a predicate on clocks which 
implies the associated guard and represents the set of the clock valuations at 
which the transition becomes urgent. Inclusion of deadlines in the corresponding 
guards implies tim.e reactivity that is, whenever time progress stops, there exists 
at least one enabled transition. The use of deadlines has another interesting con- 
sequence. Each transition with the associated guard, deadline and assignment, 
corresponds to an elementary timed system, called timed action. 

We show how a timed transition system can be obtained as the composition 
of timed actions. 



Composition of Timed Systems 



As usual, the behavior of a timed system is obtained by composing the behavior 
of its components. Most of the work on the composition of timed systems, con- 
cerns timed process algebras. Very often it adopts a principle of independence 
between timed and untimed behavior: transitions and time steps of the system 
are obtained by composing independently the transitions and time steps of the 
components. Eurthermore, a strong synchrony assumption is adopted for time 
progress. Time can progress in the system by some amount d only if all the 
components agree to let time advance by d. This leads to elegant urgency pre- 
serving seinontics in the sense that component deadlines are respected. However, 
this orthogonality between time progress and transitions may easily introduce 
timelocks, especially when an untimed description with communication allowing 
waiting, e.g. rendez-vous, is extended into a timed description. In such cases, it 
is questionable whether the application of a strong synchronization rule for time 
progress is always appropriate. Eor instance, if two systems are in states from 
which they will never synchronize, it may be desirable not to further constrain 
time progress by the strong synchronization rule. 

As an alternative to urgency preserving semantics, ^ihle composition se- 
mantics have been studied [BST97,BS98]. This semantics preserve time reac- 
tivity. To avoid timelocks, urgency constraints are relaxed in some manner that 
is shown to be optimal. The main idea behind flexible semantics, is to adjust 
waiting times of the components so as to achieve a desirable global behavior 
satisfying by construction, the following two sanity properties. 
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One property is time reactivity which can be guaranteed by construction 
and is related to absence of timelock. Contrary to other stronger well-timedness 
properties, time reactivity is very easy to satisfy by construction. 

The second property is activity preservation and is related to absence of 
(local) deadlock. It requires that if some action can be executed after waiting by 
some time in a component, then some (not necessarily the same) action of the 
system can be executed, after waiting by some (not necessarily the same) time. 

The Compositional Framework 

We show how timed systems can be built from timed actions by preserving both 
time reactivity and activity of components. 

The set of the timed actions on given set of clocks, set of control states and 
vocabulary of action names, consists of a transition on control states labeled by 
a tuple (a, d, /) where a is an action name, ^ is a guard, d is a deadline and / 
is a function on clocks. The guard g and the deadline d are predicates on clocks 
such that d implies representing respectively the set of enabling and the set 
of the urgent states of the timed action. The function / represents the effect of 
the execution on clock states. 

A timed system is a set of timed actions. Following a standard process alge- 
bra approach, it can be described in an algebra of terms generated from some 
constant, representing the idle system, by using timed action prefixing, non de- 
terministic choice and recursion. Equality of terms is the congruence obtained 
by assuming associativity, commutativity and idempotence of non deterministic 
choice, that is, the labeled transition structures of the terms are bisimilar, where 
equality of two labels means identity of their action names and equivalence of 
the corresponding guards, deadlines and functions. 

We define two kinds of operators on timed systems: priority choice operators 
and parallel composition operators. The operators are timed extensions of unti- 
med operators. We give sufficient conditions for preserving both time reactivity 
and activity of components. 

Priority choice operators 

Priority is a very useful concept for modeling interrupts or preemption in real- 
time systems. A well-known difficulty with introducing priorities, is that they 
are badly compatible with compositionality and increment ality of specification 
[BBK86,CH90,BGL97]. 

We define priority choice operators, that is choice operators depending on a 
relation between actions. This relation is an order on action names parameterized 
by non negative reals representing degrees of priority. Roughly speaking, if action 
02 has priority over action ai of degree d, then in the priority choice of two 
timed actions with labels 02 and ai, action ai will be disabled if action 02 will 
be enabled within d time units. The main results concerning priority choice are 
the following: 

— Priority choice operators can be expressed in terms of non deterministic 

choice operators, by restricting appropriately the guard and the deadline of 
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actions of lower priority. The restricted guards and deadlines can be specified 
in a simple modal language. However, modalities are just a macronotation, 
as they represent quantification over time which can be eliminated. 

— We provide sufficient conditions on the priority order, for the priority ope- 
rators to be associative, commutative and idempotent. This result allows to 
consider priority choice operators as basic operators, generalizations of non 
deterministic choice. The latter can be considered as the choice operator for 
the empty priority order. 

— We show that under these conditions, priority order operators preserve ac- 
tivity in the following sense: for every state, if an action a is enabled under 
the non deterministic choice then either a or a higher priority action will be 
enabled under the priority choice. 

Parallel composition operators 

Parallel composition operators for timed systems are considered as extensions 
of parallel composition operators for untimed systems. We suppose, as usual, 
that the latter are defined in terms of choice operators and some associative and 
commutative synchronization operator on actions, by means of an expansion 
rule [Mil83,Mil89]. Synchronization operators associate with pairs of actions the 
action resulting from their synchronization. The main results concerning parallel 
composition operators are the following: 

— Parallel composition operators can be expressed in terms of choice operators, 
by appropriately extending the synchronization operators on timed actions. 
Synchronization operators are associative and commutative and compose 
componentwise the guards and the deadlines of the synchronizing actions. 

— For the composition of guards, different synchronization modes of practical 
interest are studied. Apart from the usual and-synchronization, where the 
synchronization guard is the conjunction of the guards of the synchroni- 
zing actions, are considered max-synchronization allowing waiting, and min- 
synchronization allowing interruption by the fasted component. 

— Parallel composition operators are associative and commutative if they are 
extensions of untimed operators satisfying the same properties. 

— We show that maximal progress can be achieved in synchronization by using 
priority choice in the expansion rules. Furthermore, we provide sufficient 
conditions for activity preservation. 

The algebraic framework is completed by studying a simple algebra with 
synchronization operators for timed actions. We deduce laws for timed systems 
that take into account the structure of the actions and there properties. 



Typed Actions - A Simplified Framework 

A practically interesting simplification of the theoretical framework comes from 
the (trivial) remark that any timed action can be expressed as the non deter- 
ministic choice between a lazy action and an eager action. A lazy action is an 
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action whose set of urgent states is empty and an eager action has its deadline 
equal to its guard. This allows to consider only these two types of actions in 
specifications and simplifies the rules for synchronization. 

Sometimes it is useful in practice, to consider a third type of urgency, de- 
lay able actions. An action is delayable if its deadline is exactly the falling edge 
of the guard. That is, it cannot be disabled without becoming urgent. We show 
that parallel composition of systems with delayable actions yields systems with 
delayable actions. 



Discussion 

The distinction between urgency preserving and flexible approach seems to be 
an important one and is related to the ultimate purpose of the specification. 
When a complete specification is sought, in view of analysis and verification, it 
is reasonable to consider that the violation of component deadlines is an error. 
On the contrary, if the purpose of the specification is to derive a system which is 
correct with respect to given criteria, knowing the behavior of its components, 
the flexible approach is appropriate. This approach provides a basis for con- 
structing timed systems that satisfy the two sanity properties, time reactivity 
and activity preservation. It is very close to synthesis and can be combined with 
automatic synthesis techniques. 

An important outcome of this work is that composition operators for unti- 
med systems admit different timed extensions due to the possibility of controlling 
waiting times and “predicting” the future. The use of modalities in guards dra- 
stically increases succinctness in modeling and is crucial for compositionality. It 
does not imply extra expressive power for simple classes of timed systems, where 
quantification over time in guards can be eliminated. 

The definition of different synchronization modes has been motivated by the 
study of high level specification languages for timed systems, such as Timed 
Petri nets and their various extensions [SDdSS94,SDLdSS96,JLSIR97]. We have 
shown that the proposed framework is a basis for the study of the underlying 
semantics and composition techniques; if they are bounded, then they can be 
represented as timed systems with finite control. 

An outstanding fact is that the combined use of the different synchronization 
modes, drastically helps keeping the complexity of the discrete state space of the 
descriptions low [BST97]. Both max-synchronization and mm-synchronization 
can be expressed in terms of and-synchronization but this requires additional 
states and transitions. Furthermore, this destroys compositionality, in the sense 
that timed specifications cannot be obtained from untimed specifications by 
preserving the control structure. 

We believe that max-synchronization and mm-synchronization are very po- 
werful primitives for the specification of asynchronously cooperating timed sy- 
stems. The use of and-synchronization is appropriate when a tight synchroniza- 
tion between the components is sought. The other two synchronization modes 
allow avoiding “clashes” in cooperation, for systems of loosely coupled compo- 
nents. For instance, max-synchronization corresponds to timed rendez-vous and 
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can be used to obtain in a straightforward manner, timed extensions of asyn- 
chronously communicating untimed systems. 

The presented framework requires further validation by examples and prac- 
tice. We are currently applying the flexible approach to the compositional gene- 
ration of timed models of real-time applications and in particular, to scheduling. 
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Timed Automata 



Rajeev Alur* 



Abstract. Model checking is emerging as a practical tool for automa- 
ted debugging of complex reactive systems such as embedded controllers 
and network protocols (see [23] for a survey). Traditional techniques for 
model checking do not admit an explicit modeling of time, and are thus, 
unsuitable for analysis of real-time systems whose correctness depends on 
relative magnitudes of different delays. Consequently, timed o/atornoio. [7] 
were introduced as a formal notation to model the behavior of real-time 
systems. Its definition provides a simple way to annotate state-transition 
graphs with timing constraints using finitely many real- valued clock va- 
riables. Automated analysis of timed automata relies on the construction 
of a finite quotient of the infinite space of clock valuations. Over the years, 
the formalism has been extensively studied leading to many results esta- 
blishing connections to circuits and logic, and much progress has been 
made in developing verification algorithms, heuristics, and tools. This 
paper provides a survey of the theory of timed automata, and their role 
in specification and verification of real-time systems. 



1 Modeling 

Transition systems. We model discrete systems by state-transition graphs 
whose transitions are labeled with event symbols. A transition system S' is a 
tuple (Q, ^), where Q \s> o, set of states, Q o, set of initial states, 

A is a set of labels (or events), and Q x A x Q is a set of transitions. The 
system starts in an initial state, and if g A then the system can change its 
state from q to on event a. We write q ^ q^ \i q ^ q^ for some label a. The 
state q^ is reachable from the state q if g gb The state g is a reachable state 
of the system if q is reachable from some initial state. 

A complex system can be described as a product of interacting transition sy- 
stems. Let S2 = be two transition 

systems. Then, the product^ denoted S'i||S'2, is (Qi x Q 2 ,Qi x U A2,^) 

where (^1,^2) {Q17Q2) either (i) a G Ai n A2 and qi q[ and q2 -^2 ^2? 

or (ii) a G Ai \ A2 and qi ~^i q[ and q'2 = ^2? er (iii) a G A2 \ A\ and q2 -^2 Q2 
and q[ = q\. Observe that the symbols that belong to the alphabets of both the 
automata are used for synchronization. 
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Transition systems with timing constraints. To express system behaviors 
with timing constraints, we consider finite graphs augmented with a finite set of 
(real- valued) clocks. The vertices of the graph are called locations^ and edges are 
called switches. While switches are instantaneous, time can elapse in a location. 
A clock can be reset to zero simultaneously with any switch. At any instant, the 
reading of a clock equals the time elapsed since the last time it was reset. With 
each switch we associate a clock constraint, and require that the switch may be 
taken only if the current values of the clocks satisfy this constraint. With each 
location we associate a clock constraint called its invariant^ and require that 
time can elapse in a location only as long as its invariant stays true. Before we 
define the timed automata formally, let us consider a simple example. 



d,y> 2 




Fig. 1. A timed automaton with 2 clocks 

Consider the timed automaton of Figure 1 with two clocks. The clock x gets set 
to 0 each time the system switches from sq to on symbol a. The invariant 
(x < 1 ) associated with the locations and S 2 ensures that c-labeled switch from 
S2 to S 3 happens within time 1 of the preceding a. Resetting another independent 
clock y together with the 6 -labeled switch from Si to S2 and checking its value 
on the d-labeled switch from S 3 to sq ensures that the delay between b and 
the following d is always greater than 2. Notice that in the above example, to 
constrain the delay between a and c and between b and d the system does not 
put any explicit bounds on the time difference between a and the following 6 , or 
c and the following d. This is an important advantage of having multiple clocks 
which can be set independently of one another. 

Clock constraints and clock interpretations. To define timed automata 
formally, we need to say what type of clock constraints are allowed as invariants 
and enabling conditions. For a set X of clocks, the set of clock constraints 

ip is defined by the grammar 

ip ’.= X < C I C < X I X < C I C < X I (pl A p 2 -> 

where x is a clock in X and c is a constant in Q. A clock interpretation u for a set 
X of clocks assigns a real value to each clock; that is, it is a mapping from X to 
the set IR of nonnegative reals. For ^ G IR, i^T ^ denotes the clock interpretation 
which maps every clock x to the value i/{x) + 6. For F C A, jy[Y := 0] denotes 
the clock interpretation for X which assigns 0 to each x G R, and agrees with 1 / 
over the rest of the clocks. 
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Syntax and semantics. A timed automaton A is a tuple (L, X, /, , 

where 

— L is a finite set of locations , 

— C L is a set of initial locations, 

— A' is a finite set of labels, 

— X is a finite set of clocks, 

— / is a mapping that labels each location s with some clock constraint in 
^(X), and 

— ECLxXlx 2^ X ^(X) X L is a set of switches. A switch {s^a^Lp^X^ s^) 
represents an edge from location s to location s' on symbol a. c/p is a clock 
constraint over X that specifies when the switch is enabled, and the set 
A C X gives the clocks to be reset with this switch. 

The semantics of a timed automaton A is defined by associating a transition 
system Sa with it. A state of Sa is a pair (s,i/) such that s is a location of 
A and i/ is a clock interpretation for X such that v satisfies the invariant /(s). 
The set of all states of A is denoted Q a- A state (s, i/) is an initial state if s is 
an initial location of A and v[x) = 0 for all clocks x. There are two types of 
transitions in Sa'- 

Elapse of time: for a state (s, i/) and a real- valued time increment ^ > 0, 

(s, n) (s, i/ T if for all 0 < i/ T 6' satisfies the invariant /(s). 

Location switch: for a state (s,i/) and a switch {s^a^ip^X^ s') such that n 
satisfies cp, (s,i/) A [s' ^v[X := 0 ]). 

Thus, Sa is a transition system with label-set A'UlR. For instance, for the timed 
automaton of Figure 1, the state-space of the associated transition system is 
{so, si, S 2 , S 3 } X IR^, the label-set is {a, 6 , c, d} U IR, and sample transitions are 

(so, 0,0) y (so,1.2,1.2) A (si, 0,1.2) X (si,0.7, 1.9) 4 (s 2 , 0.7,0) 

Note the time-additivity property: \i q X q' and q' A q" then q q" . 

Remark 1 (Nonzenoness), We have omitted requirements on the definition ne- 
cessary for executability. First, when the invariant of a location is violated, some 
outgoing edge must be enabled. Second, from every reachable state, the automa- 
ton should admit the possibility of time to diverge. For example, the automaton 
should not enforce infinitely many events in a finite interval of time. Automata 
satisfying this operational requirement are called nonZeno. The interested reader 
is referred to [1,29,11]. ■ 

Product construction. We proceed to define a product construction for timed 
automata so that a complex system can be defined as a product of component sy- 
stems. Let Ai = (Li, A\, Xi, ii, L\) and A 2 = (T 2 , A 2 , X 2 , i 2 A 2 ) be two 

timed automata. Assume that the clock sets Xi and X 2 are disjoint. Then, the 
product automaton All I A 2 is (Li x ^ 2,^1 ^ XJiU UX 2 jIjE)^ where 

i(si, S2) = A I[s2) and the switches are defined by: 
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1. for a G ni72, for every (-si, a, Ai, s[) in Ei and (^2, a, v?2, A2, -S2) in ^2, 
ii' has ((si,S2),a,93i A 932, Ai U A2, (s'^, s^)). 

2. for a G A\ \ A2, for every (s, a, c/p, A, in and every t in L2, E has 
{{s,t),a,(f,X,{s',t)). 

3. for a e i72 \ i7i, for every {s, a,ip, X, s^) in E 2 and every t m Li, E has 
{{t,s),a,(f,X,{t,s')). 

Thus, locations of the product are pairs of component-locations, and the invari- 
ant of a compound location is the conjunction of the invariants of the component 
locations. The switches are obtained by synchronizing the switches with identical 
labels. 

Train- Gate Controller Example. We consider an example of an automatic 
controller that opens and closes a gate at a railroad crossing. The system is 
composed of three components: Train, Gate and Controller as shown in 
Figure 2. The safety correctness requirement for the system is that whenever 
the train is inside the gate, the gate should be closed. This corresponds to esta- 
blishing that in every reachable state, if the location of Train is S2 then the 
location of Gate should be t2. Observe that such a location is reachable in the 
product graph. For example, there is an edge from the initial location (sq, ^o) 
to (-si,to,Ri), and from (<si,to,Ri) to (-S2,to,ni), corresponding to the scenario 
in which the event approach is immediately followed by the event in. This is 
because our product is simply a syntactic operation that annotates product lo- 
cations with conjunctions of invariants, and product edges with conjunctions of 
enabling conditions, without any analysis. If we consider the timing information, 
we can establish that the event approach cannot be immediately followed by the 
event in: in the location (-Si, to, ^i) both clocks x and z have the same value, and 
hence the event lower with guard z = 1 is guaranteed to precede the event in 
with guard x > 2. The computational problem in timing verification is to make 
such deductions by analyzing the timing constraints. 

Remark 2 ( Cornpositionality) . For communication between system components, 
many competing alternatives to the definition used in this paper exist. The choice 
of synchronization primitives is somewhat orthogonal to the problem of analysis 
of timing constraints, and the algorithmic techniques for timed automata can be 
applied to other models. To model open real-time systems (i.e. those interacting 
with the environment), one needs to make a distinction between which events are 
controlled by the system and which events are controlled by the environment. 
Such a compositional framework provides foundations to decompose the analysis 
problem into simpler problems [44,11,43]. Issues pertaining to the impact of 
timing on synchronization are studied in [19]. ■ 



2 Reachability Analysis 

A location s of the timed automaton A is said to be reachable if some state q with 
location component s is a reachable state of the transition system Sa- The input 
to the reachability problem consists of a timed automaton A and a set C L 
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Fig. 2. Train-gate controller 



of target locations of A. The reachability problem is to determine whether or 
not some target location is reachable. Verification of safety requirements of real- 
time systems can be formulated as reachability problems for timed automata, 
as illustrated in the train-gate example. Since the transition system of a 
timed automaton is infinite, our solution to the reachability problem involves 
construction of finite quotients. 

Time-abstract transition system. The transition system Sa of a timed au- 
tomaton A has infinitely many states and infinitely many symbols. As a first 
step, we define another transition system, called the time- abstract transition sy- 
stem and denoted U a^ whose transitions are labeled only with the symbols in U 
by hiding the labels denoting the time increments. The state-space of Ua equals 
the state-space Qa of Sa- The set of initial states oi U a equals the set of initial 
states of Sa- The set of labels oi U a equals the set U of labels of A. The tran- 
sition relation of U a is the relation for states q and q' and a label a^ q ^ q' 
iff there exists a state q^^ and a time value ^ G IR such that q q^^ A holds 
in the transition system Sa- In the reachability problem for timed automata, 
we wish to determine reachability of target locations. It follows that to solve 
reachability problems, we can consider the time- abstract transition system Ua 
instead of Sa- 

S table quotients. While the time- abstract transition system Ua has only fini- 
tely many labels, it still has infinitely many states. To address this problem, we 
consider equivalence relations over the state-space Qa- An equivalence relation 
^ over the state-space Qa is said to be stable iff whenever q ^ u and q ^ q\ 
there exists a state u' such that u ^ u' and q' ^ wb The quotient of U a with 
respect to a stable partition ^ is the transition system [Ua]^'- states of are 
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the equivalence classes of an equivalence class tt is an initial state of [?7 a]-- if 
TV contains an initial state of of labels is A', and contains an 

a- labeled transition from the equivalence class tt to the class if for some g G tt 
and ^ tv\ q ^ q^ holds in Ua^ 

To reduce the reachability problem (A, ) to a reachability problem over the 

quotient with respect to we need to ensure, apart from stability, that ^ 
does not equate target states with non-target states. An equivalence relation 
^ is said to be -sensitive, for a set C L of target locations, if whenever 
(s, i/) ^ (s^, i/^), either both s and belong to , or both s and do not belong 
to . Consequently, to solve the reachability problem (A, L^), we search for an 
equivalence relation ^ that is stable, -sensitive, and has only finitely many 
equivalence classes. 

Region equivalence. We define an equivalence relation on the state-space of 
an automaton that equates two states with the same location if they agree on 
the integral parts of all clock values and on the ordering of the fractional parts 
of all clock values. The integral parts of the clock values are needed to determine 
whether or not a particular clock constraint is met, whereas the ordering of the 
fractional parts is needed to decide which clock will change its integral part 
first. For example, if two clocks x and y are between 0 and 1 in a state, then a 
transition with clock constraint (x = 1) can be followed by a transition with clock 
constraint [y = 1), depending on whether or not the current clock values satisfy 
{x < y). The integral parts of clock values can get arbitrarily large. But if a clock 
X is never compared with a constant greater than c, then its actual value, once 
it exceeds c, is of no consequence in deciding the allowed switches. Here, we are 
assuming that all clock constraints involve comparisons with integer constants (if 
the clock constraints involve rational constants, we can multiply each constant 
by the least common multiple of denominators of all the constants). 

Now we formalize this notion. For any 6 G IR, /r(^) denotes the fractional part of 
and denotes the integral part of 6] that is, 6 — fr{6). For each clock 

X G A, let be the largest integer c such that x is compared with c in some 
clock constraint appearing in an invariant or a guard. The equivalence relation =, 
called the region equivalence^ is defined over the set of all clock interpretations for 
X. For two clock interpretations n and i/^, i/ = i/Aff all the following conditions 
hold: 

1. For all clocks x G A, either and are the same, or both n[x) 

and p\x) exceed c^. 

2. For all clocks x^y with p{x) < Cx and i/{y) < Cy^ fr[i/{x)) < fr{u{y)) iff 
fr{v'{x)) < fr{v'{y)). 

3. For all clocks x G A with i/[x) < c^, fr[i/{x)) = 0 iff fr[i/^{x)) = 0. 

A clock region for A is an equivalence class of clock interpretations induced by 
= . The nature of the equivalence classes can be best understood through an 
example. Consider a timed transition table with two clocks x and y with = 2 
and Cy = 1. The clock regions are shown in Figure 3. Note that there are only a 
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finite number of regions, at most kl • 2^ • + 2), where k is the number 

of clocks. Thus, the number of clock regions is exponential in the encoding of 
the clock constraints. 



y 



1 



0 




X 



6 Corner points: e.g. [(0,1)] 

14 Open line segments: e.g. [0 < x = y < 1] 
8 Open regions: e.g. [0 < a: < y < 1] 



Fig. 3. Clock regions 



Region automaton. Region equivalence relation = over the clock interpreta- 
tions is extended to an equivalence relation over the state-space by requiring 
equivalent states to have identical locations and region-equivalent clock inter- 
pretations: (s,i/) = iff s = and jy = i/k The key property of region 

equivalence is its stability. The quotient [C^]^ of a timed automaton with res- 
pect to the region equivalence is called the region automaton of A, and is denoted 
R{A). The number of equivalence classes of = is finite, it is stable, and it is - 
sensitive irrespective of the choice of the target locations. It follows that to solve 
the reachability problem (A,L^), we can search the finite region automaton 
R{A). 

Complexity of reachability. Reachability can be solved in time linear in the 
number of vertices and edges of the region automaton, which is linear in the 
number of locations, exponential in the number of clocks, and exponential in 
the encoding of the constants. Technically, the reachability problem is Ps PACE- 
complete. In fact, in [24], it is established that both sources of complexity, the 
number of clocks and the magnitudes of the constants, render PsPACE-hardness 
independently of each other. 

Remark 3 (Choice of timing constraints and decidability). The clock constraints 
in the enabling conditions and invariants of a timed automaton compare clocks 
with constants. Such constraints allow us to express (constant) lower and upper 
bounds on delays. For any generalization of the constraints, our analysis techni- 
que breaks down. In fact, if we allow constraints of the form x = 2y (a special 
case of linear constraints over clocks), then the reachability problem becomes 
undecidable [7]. ■ 

Zone automata. One strategy to improve the region construction is to collapse 
regions by considering convex unions of clock regions. A clock zone cp is a set of 
clock interpretations described by conjunction of constraints each of which puts 
a lower or upper bound on a clock or on difference of two clocks. If A has k 
clocks, then the set cp is a convex set in the A:-dimensional euclidean space. 
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The reachability analysis using zones uses the following three operations: 

— For two clock zones cp and denotes the intersection of the two zones. 

— For a clock zone denotes the set of interpretations u S for u E (f 

and ^ G IR. 

— For a subset A of clocks and a clock zone (/?, (f[X := 0] denotes the set of 
clock interpretations i/[X := 0] for u ^ ip. 

A key property of the set of clock zones is closure under the above three ope- 
rations. A zone is a pair {s^ip) for a location s and a clock zone c/p. We build a 
transition system whose states are zones. Consider a zone (s, (/?) and a switch 
e= (s,a,7/;,A,s^) of A. Let succipp^ e) be the set of clock interpretations C such 
that for some u E ip^ the state can be reached from the state {s^u) by 

letting time elapse and executing the switch e. That is, the set succ[(p^e)) 
describes the successors of the zone (s, (/?) under the switch e. The set succ[(p^ e) 
can be computed using the three operations on clock zones as follows: 

succ{p)^e) = ((((/? A i(s)) ff') A i(s) A t/;)[A := 0] 

Thus, clock zones are effectively closed under successors with respect to switches. 
A zone automaton has edges between zones (s, cp) and succ[(p^ e)). For a timed 
automaton A, the zone automaton Z{A) is a transition system: states of Z{A) 
are zones of A, for every initial location s of A, the zone (s, [X := 0]) is an initial 
location of A(A), and for every switch e = (s, a, t/;. A, of A and every clock 
zone (p^ there is a transition ((s, (/?), a, (s^, 5 wcc(c/p, e))). 

Difference-bound matrices. Clock zones can be efficiently represented using 
matrices [27]. Suppose the timed automaton A has k clocks, Xi, . . .x/^. Then a 
clock zone is represented bya(A: + l)x(A:+l) matrix D. For each i, the entry 
Dio gives an upper bound on the clock x^, and the entry Doi gives a lower bound 
on the clock x^. For every pair i,j, the entry Dij gives an upper bound on the 
difference of the clocks x^ and xj . To distinguish between a strict and a nonstrict 
bound (i.e. to distinguish between constraints such as x < 2 and x < 2), and 
allow for the possibility of absence of a bound, define the bounds- domain IK to 
be2Zx{0,l}u{cx)}. The constant cx> denotes the absence of a bound, the bound 
(c, 1), for c G denotes the nonstrict bound < c, and the bound (c, 0) denotes 
the strict bound < c. A di erence-bound matrix (Dbm) D is a (A: + 1) x (A: + 1) 
matrix D whose entries are elements from IK. As an example, consider the clock 
zone 

(0 < xi < 2) A (0 < X2 < 1) A (xi — X2 > 0) 
can be represented by the matrix D as well as by the matrix Db 





Matrix D 


Matrix 




0 1 2 


0 1 2 


0 

1 

2 


~ (0,1) (0,0) 

(2.0) oo oo 

(1.0) (0,1) ^ 


(0,1) (0,1) (0,0) 
(2,0) (0,1) (2,0) 
(1,0) (0,1) (0,1) 
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Observe that there are many implied constraints that are not reflected in the 
matrix while the matrix is obtained from the matrix D by “tightening” 
all the constraints. Such a tightening is obtained by observing that sum of the 
upper bounds on the clock differences Xi — Xj and Xj — xi is an upper bound on 
the difference Xi — xi (for this purpose, the operations of + and < are extended 
to the domain IK of bounds). Matrices like with tightest possible constraints 
are called canonical. The Dbm D is satisfiahle if it represents a nonempty clock 
zone. Every sat is liable Dbm has an equivalent canonical Dbm. We use canonical 
Dbms to represent clock zones. Given a Dbm, using classical algorithms for 
computing all-pairs shortest paths, we check whether the Dbm is sat is liable, 
and if so, convert it into a canonical form. Two canonical Dbms D and D' are 
equivalent iff Dij = for all 0 < < k. This test can be used during 

the search to determine if a zone has been visited earlier. The representation 
using canonical DBMs supports the required operations of conjunction, t/; ff', 
and 7 /; [A := 0] efficiently (cf. [27]). 

Theoretically, the number of zones is exponential in the number of regions, and 
thus, the zone automaton may be exponentially bigger than the region automa- 
ton. However, in practice, the zone automaton has fewer reachable vertices, and 
thus, leads to an improved performance. Furthermore, while the number of clock 
regions grows with the magnitudes of the constants used in the clock constraints, 
experience indicates that the number of reachable zones is relatively insensitive 
to the magnitudes of constants. 

Implementation. The input to a verification problem consists of a set of com- 
ponent timed automata and the solution demands searching the region auto- 
maton R{\\iAi) or Z(II^A^). The actual search can be performed by an on-the-fly 
enumerative engine or a BoD-based symbolic engine. We briefly sketch imple- 
mentation of the search in timed CosPAN [15]. Suppose the input program F 
consists of a collection of coordinating timed automata Ai. For each A^, let A( 
be the automaton without any timing annotations. A preprocessor generates a 
new program that consists of automata A^, together with the description of 
a monitor automaton A^ encoding the region construction or Az encoding the 
DBM-based zone construction. Suppose \\iAi has k clocks, and all the constants 
are bounded by c. The automaton Ar has 2k variables: k variables ranging over 
0..C that keep track of the integral parts of the clocks, and k variables ranging 
over l..k that give the ordering of the fractional parts. The automaton Az has 
{k + 1)^ variables ranging over — c..c that keep track of the numerical entries 
in the Dbm and {k -\- 1)^ boolean variables that keep track of the strictness bit 
for each matrix entry. The update rules for these variables refer to the state- 
variables of the component automata. Searching the region automaton of \\iAi 
is semantically equivalent to searching the product of ||^A( with Ar^ while se- 
arching the zone automaton of \\iAi is semantically equivalent to searching the 
product of ||^A( with Az^ Following the preprocessing step, the search engine 
of Cospan is used to perform the search on the input program using Bdds 
or using on-the-fly enumerative search. Experience shows that for enumerative 
search the zone construction is preferable, while for symbolic search the region 
construction is preferable. 
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Remark 4 (Dense vs discrete time). Our choice of time domain is IR, the set of 
nonnegative real numbers. Alternatively, we could choose Q, the set of rational 
numbers, and all of the results stay unchanged. The key property of the time 
domain, in our context, is its denseness, which implies that arbitrarily many 
events can happen at different times in any interval of nonzero length. On the 
other hand, if we choose IN, the set of nonnegative integers, to model time, 
we have a discrete-time model, and the flavor of the analysis problems changes 
quite a bit. In the dense-time model, reachability for timed automata is PSPACE, 
while universality is undecidable; in the discrete-time case, reachability for ti- 
med automata is still PsPACE, while universality is Expspace. We believe that 
discrete-time models, while appropriate for scheduling applications, are inappro- 
priate for modeling asynchronous applications such as asynchronous circuits. For 
verification of real-time systems using discrete-time models, see, for instance, [28, 
21]. In [34], it is established that under certain restrictions the timed reachability 
problem has the same answer irrespective of choice between IN and IR. ■ 

Remark 5 (Minimization), Suppose we wish to explicitly construct a represen- 
tation of the state-space of a timed automaton. Then, instead of building the 
region or the zone automaton, we can employ a minimization algorithm that 
constructs the coarsest stable refinement of a given initial partition by refining 
it as needed [4,54,37,50]. ■ 

Remark 6 (Alternative Symbolic Representations) , There have been many at- 
tempts to combine BoD-based representation of discrete locations with Dbm- 
based representation of zones. Sample approaches include encoding Dbms using 
Beds with particular attention to bit patterns in the variable ordering [20], and 
variants of Beds speciflcally designed to represent clock constraints [18]. ■ 

3 Discussion 

We have summarized the basic techniques for analysis of timed automata (see 
also [41] for an introduction). We conclude by briefly discussing tools, applicati- 
ons, and theoretical results. 

Tools. A variety of tools exist for specification and verification of real-time 
systems. We list three that are most closely related to the approach discussed 
in this paper. The tool timed CosPAN is is an automata-based modeling and 
analysis tool developed at Bell Labs (see [15,13]). The tool Kronos, developed 
at VERIMAG, supports model checking of branching-time requirements [25]. 
The Uppaal toolkit is developed in collaboration between Aalborg University, 
Denmark and Uppsala University, Sweden [40] and allows checking of safety 
and bounded liveness properties. All these tools incorporate many additional 
heuristics for improving the performance. 

Applications. The methodology described in this paper is suitable for finding 
logical errors in communication protocols and asynchronous circuits. Examples 
of analyzed protocols include Philips audio transmission protocol, carrier-sense 
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multiple-access with collision detection, and Bang-Olufsen audio/video protocol 
(a detailed description of these and other case studies can be obtained from the 
homepages of Kronos or Uppaal). The application of Cospan to verification 
of the asynchronous communication on the STAR! chip is reported in [49], and 
to a scheduling problem in telecommunication software is reported in [14]. 
Automata-theoretic Verification. Reachability analysis discussed in Sec- 
tion 2 is adequate to check safety properties of real-time systems. To verify 
liveness properties such as “if a request occurs infinitely often, so does the re- 
sponse” we need to consider nonterminating, infinite, executions. Specification 
and verification of both safety and liveness properties can be formulated in a 
uniform and elegant way using an automata-theoretic approach [52,39,7]. In this 
approach, a timed automaton, possibly with acceptance conditions (e.g. Biichi), 
is viewed as a generator of a timed language - a set of sequences in which a 
real- valued time of occurrence is associated with each symbol. Verification cor- 
responds to queries about the timed language defined by the timed automaton 
modeling the system. If the query is given by a timed automaton that accepts 
undesirable behaviors, then verification question reduces to checking emptiness 
of the intersection, and can be solved in PSPACE. On the other hand, if the query 
is given by a timed automaton that accepts all behaviors satisfying the desired 
property, verification corresponds to testing inclusion of the two timed languages, 
and is undecidable in general [7]. Decidability of the language- inclusion problem 
can be ensured by requiring the specification automaton to be deterministic^ or 
an event- clock automaton. 

Since theory of regular (or cj-regular) languages finds many applications inclu- 
ding modeling of discrete systems, many attempts have been made to develop 
a corresponding theory of timed languages. Timed languages defined by timed 
automata can be characterized using timed version of SIS [53], timed regular ex- 
pressions [17], and timed temporal logics [36]. The complexity of different types 
of membership problems for timed automata is studied in [16]. Timed languages 
definable by timed automata are closed under union and intersection, but not 
under complementation. This has prompted identification of subclasses such as 
event-clock automata [9] with better closure properties. 

Equivalence and Refinement Relations. While timed language equivalence 
for timed automata is undecidable, stronger equivalences such as timed bisimula- 
tion and simulation are decidable. For a timed automaton A, a timed bisimulation 
is an equivalence relation ^ on the state-space Qa such that whenever q\ ^ q 2 ^ 
if qi q[ for a G 27 U IR, then there exists q '2 with q 2 q '2 and q[ ^ qf While 
the number of equivalence classes of the maximal timed bisimulation relation is 
infinite, the problem of deciding whether there exists a timed bisimulation that 
relates two specified initial states is, surprisingly, decidable [51] (the algorithm 
involves analysis of the region automaton of the product space Q{A) x Q[A)). 
The same proof technique is useful to obtain algorithms for checking existence of 
timed simulation [48] (timed simulation relations are useful for establishing refi- 
nement between descriptions at different levels of abstractions). The complexity 
of deciding timed (bi) simulation is Exptime. A hierarchy of approximations to 
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timed bisimulation relation can be defined on the basis of the number of clocks 
that an observer must use to distinguish between two timed automata [6]. The 
impact of the precision of the observer’s clocks on the distinguishing ability is 
studied in [42]. 

Linear real-time temporal logics. Linear temporal logic (Ltl) [46] is a po- 
pular formalism for writing requirements regarding computations of reactive 
systems. A variety of real-time extensions of Ltl have been proposed for writ- 
ing requirements of real-time systems [45,38,10,8]. In particular, the real-time 
temporal logic Metric Interval Temporal Logic (Mitl) admits temporal connec- 
tives such as always^ eventually ^ and until^ subscripted with intervals. A typical 
bounded-response requirement that “every request p must be followed by a res- 
ponse q within 3 time units” is expressed by the Mitl formula □( p ^ 0<3 q). 
To verify whether a real-time system modeled as a timed automaton A satis- 
fies its specification given as a Mitl formula p, the model checking algorithm 
constructs a timed automaton that accepts all timed words that violate 
(p, and checks whether the product of A with A^^^ has a nonempty language 
[8]. The definition of Mitl requires the subscripting intervals to be nonsingu- 
lar. In fact, admitting singular intervals as subscripts (e.g. formulas of the form 
□ (p ^ 0=1 q)) makes translation from Mitl to timed automata impossible, 
and the satisfiability and model checking problems for the resulting logic are 
undecidable. See [31] for a recent survey of real-time temporal logics. 
Branching real-time temporal logics. Many tools for symbolic model check- 
ing employ the branching-time logic Ctl [22,47] as a specification language. 
The real-time logic Timed Computation Tree Logic (Tctl) [3] allows temporal 
connectives of Ctl to be subscripted with intervals. For instance, the bounded 
response property that “every request p must be followed by a response q within 
3 time units” is expressed by the Tctl formula VD( p ^ VO <3 q). It turns 
out that two states that are region-equivalent satisfy the same set of Tctl- 
formulas. Consequently, given a timed automaton A and a TcTL-formula p, 
the computation the set of states of A that satisfy p, can be performed by a 
labeling algorithm that labels the vertices of the region automaton R{A) with 
subformulas of p starting with innermost subformulas [3]. Alternatively, the 
symbolic model checking procedure computes the set of states satisfying each 
sub formula by a fixpoint routine that manipulates zone constraints [35]. 
Probabilistic models. Probabilistic extensions of timed automata allow mo- 
deling constraints such as “the delay between the input event a and the output 
event b is distributed uniformly between 1 to 2 seconds” (cf. [2]). With introduc- 
tion of probabilities, the semantics of the verification question changes. Given 
a probabilistic timed automaton A and a specification automaton As that ac- 
cepts the undesirable behaviors, verification corresponds to establishing that the 
probability that the run of the system A generates a word accepted by As' is 
zero. A modification of the cycle detection algorithm on the region automaton 
of the product of A and As can solve this problem [2]. A similar approach works 
for verifying Tctl properties of a probabilistic timed automaton. However, if 
we introduce explicit probabilities in the requirements (e.g. event a will happen 
within time 2 with probability at least 0.5), then model checking algorithms are 
known only for a discrete model of time [26]. 
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Hybrid systems. The model of timed automata has been extended so that con- 
tinuous variables other than clocks, such as temperature and imperfect clocks, 
can be modeled. Hybrid automata are useful in modeling discrete controllers 
embedded within continuously changing environment. Verification of hybrid au- 
tomata is undecidable in general. For the subclass of rectangular automata^ ana- 
lysis is possible via language-preserving translation to timed automata [33], and 
for the subclass of linear hybrid automata^ analysis is possible based on symbolic 
fixpoint computation using polyhedra [12]. See [5] for an introduction to the 
theory, to [32] for an introduction to the tool HyTech, and to [30] for a survey. 

Acknowledgements. My research on timed automata has been in collabora- 
tion with Costas Courcoubetis, David Dill, Tom Henzinger, Bob Kurshan, and 
many others. Many thanks to them, and to Salvatore La Torre for comments on 
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Abstract. Stalmarck’s method is a proof search algorithm, finding proofs 
of propositional tautologies in a proof system called the Dilemma proof 
system [1,2]. The search procedure is based on a notion of proof depth cor- 
responding to the degree of nestings of assumptions in proofs. Stalmarck’s 
algorithm has been successfully used in industrial applications since 1989, 
for example in verification of railway interlockings and of aircraft control 
systems [3]. 

This talk is divided into three parts. 

Part I We discuss the proof system underlying Stalmarck’s method, the 
Dilemma proof system. We introduce a notion of refutation graphs as a 
framework for proof systems, and then define Dilemma by restrictions on 
refutation graphs. The various restrictions will be discussed with respect 
to proof complexity and be compared with restrictions defining other 
proof systems. 

A elosed refutation graph is a rooted acyclic digraph, starting with a 
set A of formulas to be refuted and succesive extensions of A with new 
consequences making all leaves in the refutation explicitly contradictory. 
Informally, a refutation graph is built up from applications of: 

Propagation ^ ^ Au {A} 

where A is a logical consequence of the formulas in A; 

Split 

^ A\j{Ar,} 

where the disjunction Ai V. . .y An is a logical consequence of the formulas 
in A; 

Merge 

A 



where A is the intersection of Ai . . . An. 
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Refutation graphs give a sound semantic framework for a variety of 
proof systems, in that particular proof systems can be characterized in 
terms of restrictions on refutation graphs. 

Restrictions on refutation graphs 

(i) Propagations are restricted to instances of predefined schematic rules; 
(ii) split assumptions are restricted to instances of predefined schematic 
formulas; 

{Hi) merge applications are restricted to certain contexts; 

(iv) the set of formulas allowed in proofs is restricted. 

Fart 11 In the second part we treat the efficiency of the search procedure 
in Stalmarck’s method, the so called i-saturation procedure and compare 
i-saturation with other search procedures such as backtracking. 

The i-saturation algorithm is related to the notion of proof depth and 
the related notion of hardness. Proof depth and hardness are defined 
for proof systems obeying the so called subformula property and are 
restricted to series parallel refutation graphs. 

Fart 111 The third part of this talk concerns the possibiliy to extend 
Stalmarck’s method to more expressive logics, in particular to Quantified 
Boolean Formulas (QBF). In this part we also discuss applications of the 
extended algorithm to model checking. 

Stalmarck’s method is based on a reduction of formulas to sets of defini- 
tions, the definitions of all compound subformulas of the original formula. 
This data stucture is sometimes called triplets. 

The extension of Stalmarck’s method to QBF is based on a reduction of 
QBF to QBF triplets and then further from QBF triplets to propositional 
logic triplets. The reduction combines three ideas of size reduction: 

(i) scope-reduction of quantifiers, that is the application of reversed 
prenex normal form transforms; 

{ii) use of definitions in order to share common sub-expressions; 

(Hi) partial evaluation ”on the fly”. 



References 

1. G. Stalmarck. A system for determining propositional logic theorems 
by applying values and rules to triplets that are generated from a 
formula, 1989. Swedish Patent No. 467 076 (approved 1992), U.S. 
Patent No. 5 276 897 (approved 1994), European Patent No. 0403 
454 (approved 1995). 

2. M. Sheeran and G. Stalmarck. A tutorial on Stalmarck’s proof proce- 
dure for propositional logic. In Proceedings of FMCAD’98^ Springer- 
Verlag LNGS vol. 1522, 1998. 

3. A. Boralv. The industrial success of verification tools based on 
Stalmarck’s method. In Proceedings of CAV’97, Springer- Verlag 
LNGS vol. 1427, 1997. 




Verification of Parameterized Systems by 
Dynamic Induction on Diagrams 



Zohar Manna and Henny B. Sipma 



Computer Science Department 
Stanford University 
Stanford, CA. 94305-9045 

{manna, sipma} (9c s . stanford.edu 



Abstract. In this paper we present a visual approach to proving pro- 
gress properties of parameterized systems using induction on verification 
diagrams. 4'he inductive hypothesis is represented by an automaton and 
is based on a state-dependent order on process indices, for increased flexi- 
bility. This approach yields more intuitive proofs for progress properties 
and simpler verification conditions that are more likely to be proved 
automatically. 



1 Introduction 

Verification diagrams represent a proof that a reactive system satisfies its tem- 
poral specification; they were proposed in [MP94] and generalized in [BMS95]. 
The purpose of a diagram is to provide a high-level proof outline that makes 
explicit the difficult parts of the proof, while hiding the tedious details. 

Parameterized systems are systems that consist of an unbounded number 
of processes that differ only in their process identifiers (process indices). Pro- 
ofs of safety properties over parameterized systems introduce universal force 
quantifiers in the verification conditions. On the other hand, proofs of progress 
properties for such systems usually introduce both universal force and existential 
force quantifiers in the verification conditions, making these proofs considerably 
harder. 

The validity of a progress property usually relies on the fairness of certain 
transitions. In the proof, these transitions must be identified, and they are repre- 
sented explicitly in a verification diagram. However, for parameterized systems, 
the validity of a progress property may depend on an unbounded number of 
distinct fair transitions, so an alternative representation must be used. 

One solution, proposed in [MP96] , is to assert the existence of such transitions 
in the diagram without explicitly identifying them. However, this partly defeats 
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the purpose of diagrams: the transitions now have to be identified at the theorem- 
proving level, by instantiating the existential quantifiers. This usually requires a 
substantial amount of user input at a level where intuition and insight into the 
program are of little help. 

In this paper, we suggest that progress properties can often be proven by 
induction on the process identifier. In many programs, processes are waiting for 
each other to achieve their goal in turn. A natural inductive hypothesis is thus 
that processes with higher priority than the current process will achieve their 
goal. In some cases, for example in the proof of progress properties for a leader 
election algorithm presented in [BLM97], standard mathematical induction with 
a fixed order on the process indices suffices. However, in many cases a more 
flexible order is required. 

The induction principle for diagrams proposed here extends the regular in- 
duction principle over the natural numbers by allowing a state-dependent order 
on the process indices. While in a proof of ip\i] by regular induction, ip[j] may 
be assumed only if j -< i, in our diagram induction ip[j] may be assumed if for 
every computation of the system eventually always j i holds. In a proof by 
diagram induction, this condition is incorporated in an automaton for the induc- 
tive hypothesis that constrains the set of computations for an arbitrary value of 
the parameter; this automaton is then conjoined with the main diagram. 

We illustrate the diagram induction principle by proving a progress property 
for a very simple parameterized system. In the last section we demonstrate a 
more complex application in the proof of accessibility for a fine-grained parame- 
terized algorithm for mutual exclusion. 

2 Preliminaries 

2.1 Computational Model 

Our computational model is that of fair transition systems [MP95]. A fair tran- 
sition system (fts) S : ^ J) consists of 

— y : A finite set of typed system variables, A state is a type-consistent in- 
terpretation of the system variables. The set of all states is called the state 
space and is designated by A. A first-order formula with free variables in V 
is called an assertion. We write s ^ p ii s satisfies p, 

— O'. The initial condition^ an assertion characterizing the initial states. 

— T: A finite set of transitions. Each transition r G T is a function r : A i-^ 2^ 

mapping each state s G A into a (possibly empty) set of r-successor states, 
r[s) C A. Each transition r is defined by a transition relation E^), 

a first-order formula in which the unprimed variables refer to the values 
in the current state s, and the primed variables refer to the values in the 
next state sf Transitions may be parameterized, thus representing a possibly 
unbounded set of transitions differring only in their parameter. 

— ff CT'. A set of just (weakly fair) transitions^ . 

^ To simplify the presentation we omit compassion (strong fairness) in this paper. 
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A run of a fair transition system S : ^ J) is an infinite sequence 

of states a : sq, 5i,S2, . . such that sq |= 6>, and for each j > 0, Sjj^i is a r- 
successor of Sj, that is, Sj^i G T[sj) for some r e T. If Sj^i is a r-successor of 
Sj we say that transition r is taken at position j. The set of runs of S is written 
Cr{S). 

A computation of a fair transition system <S is a run a of <S that satisfies 
the fairness requirement for each transition r G ^7 it is not the case that r is 
continuously enabled beyond some position j in a, but not taken beyond j. The 
set of computations of a system S is denoted by T(<S), called the language of S. 
A state is called <S- accessible if it appears in some computation of S. 

Example 1 

Figure 1 shows program simple, parameterized by M, written in the SPL langu- 
age of [MP95]. Its body is the parallel composition of M processes, indexed by i. 
The program has a global array variable a that can be observed by all processes, 
all of whose elements are initialized to false. In addition, each process has a local 
variable j that cannot be observed by any other process. 

It is straightforward to translate this program into an FTS. The system va- 
riables are an integer M , a boolean array a, an integer array j, containing the 
value of the local variable j of each process, and an array tt, containing the 
label, 7v[i] G {^o, ^2, of fho current statement of each process. Each pro- 

gram statement can be represented by a parameterized transition. For example, 
the statement labeled by is represented by the parameterized transition with 
transition relation 

Ph W : = hP r\i] = io A {a[j\i]\ V i < j[i]) Aa! = aAj' = j 

The objective of this simple program is for each process i to set a[i] to true^ but 



in M : integer where M > 0 






local a : array [1..M] of boolean where a — 


: false 




- 


local j : integer where j 


= 1' 




wtim ■■■■ 




To: for j = 1 to M do 
ii: await a[j] V i < j 
£ 2 : o\i] \= true 










_ 


£^\ 


_ 





Fig. 1. Program SIMPLE 



only after all processes with smaller process indices have done so. We will prove 
that eventually each process completes its mission. ■ 
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2.2 Specification Language 

As specification language we use linear-time temporal logic (ltl). ltl formulas 
are interpreted over infinite sequences of states. A temporal formula [MP95] is 
constructed out of state formulas and temporal operators. Below we only give 
the semantics of those operators used in our examples. 

For an infinite sequence of states a : -Sq, Si, . . ., an assertion p, and temporal 
formulas (j) and t/;, 

(a, j) \= p Sj \= p that is, p holds on state Sj 

(a, j) 1= 0 (j) iff (a, i) 1= 0 for all i > j 

(a, j) 1= 1 (j) iff (a, i) \= (j) for some i > j 

An infinite sequence of states a satisfies a temporal formula written a |= 
if 0) 1= (j). Given an fts <S, we say that an ltl formula p is S-valifi written 
<S 1= (/p, if every computation of S satisfies (/?. 

The safety closure[AL90] of a temporal formula c/p, is the smallest safety 
property, ps such that p implies ps- For example (O p)s — 0 p and (l p)s — 
true. 

Example 2 

The temporal formula 



fi[i] : 0 1 ^10 o\i] , 

parameterized by i, states that if process i is not in location infinitely often, 
then array element a[i] will eventually become true and stay true. ■ 



3 Verification Diagrams 

A verification diagram Q represents a proof that a possibly infinite-state system 
S satisfies a property p if it can be shown that Q is both an over approximation 
of the system and an under approximation of the property. In other words, 

T(5) C C{g) C C{p) 

where T(<S), T(^), and C[ip) denote the languages of the system, diagram and 
property, respectively, each of which is a set of infinite sequences of states. 

The language inclusion C[S) C T(^), which states that every computation 
of <S is a model of the diagram is justified by proving a set of first-order 
verification conditions, using deductive techniques. On the other hand, the in- 
clusion C{g) C C{ip)^ which states that every model of the diagram satisfies the 
property, is a decidable language inclusion check that can be established auto- 
matically using language- inclusion algorithms for cj- automat a. Thus, verification 
diagrams reduce the proof of an arbitrary temporal property over a system to 
the proof of a set of first-order verification conditions and an algorithmic check. 
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3.1 Definition 

Verification diagrams are cj- automat a [Tho90] augmented with an additional 
node labeling //, to establish their relation with the FTS that they verify. The 
diagrams used in this paper are a modified version of those presented in [BMS95] 
and are described in detail in [MBSU98]. 

A diagram Q : (A, Aq, //, i/, .F) over an FTS S : ly^O^T^J) and a property 
ip consists of the following components: 

— A: a finite set of nodes; 

— Aq C A: a set of initial nodes; 

— A C A X A: a set of edges connecting nodes; 

— //: a node labeling function that assigns to each node an assertion over V ; 

— i/: a node labeling function, called the property labeling^ that assigns to each 
node a boolean combination of the atomic assertions appearing in cp; 

— A C 2^ : a (Muller) acceptance condition given by a set of set of nodes. 

A path of a diagram is an infinite sequence of nodes tt : no,ni, . . ., such that 
no G Ao and for each i > 0, (n^,n^yi) G E. Given a path tt, its limit set^ written 
m/(7r), is the set of nodes that occur infinitely often in tt. Note that the limit 
set of an infinite path must be nonempty since the diagram is finite, and that 
it must be a strongly connected subgraph (SCS) of the diagram. A path tt of a 
diagram is accepting if inf {tv) G J~, 

Given an infinite sequence of states a : sq, <Si, . . ., a path tv : no,ni, ... is 
called a trail of a in the diagram if Si \= /i(n^) for alH > 0. A sequence of states 
a is a run of a diagram if there exists a trail of a in the diagram. The set of runs 
of a diagram Q is written jCr{Q). A sequence of states a : -Sq, . . . is a model 
of a diagram if there exists an accepting trail of a in the diagram. The set of 
models of a diagram Q is written E{Q), 

3.2 Verification Conditions 

Associated with a diagram is a set of first-order verification conditions that, if 
valid, prove 

^(<5) c L{g) . 

In this case we say that Q is S -valid. We use the following notation: 

For a set of nodes M = {no, . . . , n/^}, we define 

/i(M) /x(no) V . . . V /x(nA;) 

where /i({}) = false. For a node n, the set of successor nodes of n is succ{n). We 
use Hoare triple notation to state that a parameterized transition r leads from 
a state satisfying c/p to a state satisfying fj: 

{'■P}T~{r} Vi . ((/3 ^ X) 



A diagram Q is <S-valid if it satisfies the following conditions: 
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— Initiation: Every initial state of S must be covered by some initial node of 
the diagram, that is G //(TVo). 

— Consecution: For every node n and every transition r, there is a successor 
node that can be reached by taking r, that is 

{ /i(n) } T { ja{succ{n)) } , 

The Initiation and Consecution conditions, if valid, prove that every run of S is 
a run of the diagram, that is, Cu{S) C £u{Q) 

A second set of verification conditions ensures that every computation of the 
system has an accepting trail in the diagram. Thus, if an SCS A is not accepting, 
we must show that computations can always leave S or cannot stay in S forever. 

We say that an SCS S has a fair exit transition r, if the following verification 
conditions are valid for every node ra E S 

/i(m) ^ enahled{r) and { //(m) } t { jji[succ{rn) — } , 

that is, r is enabled on every node in S', and from every node in S transition r 
can be taken to leave S. Thus if an SCS has a fair exit transition, there is at 
least one trail of every computation that can leave the SCS. 

We say that an SCS S : {ni, . . . ,n/^} is well-founded if there exist ranking 
functions {^i, . . . mapping the system states into a well-founded domain 

(P, >-), such that the following verification conditions hold: there is a cut-seC C 
of edges in S such that for all edges (ni,n2) G C and every transition r, 

A pr A r {n 2 ) <^i(n-i) >- <^ 2 («- 2 ) , 

and for all other edges (ni,n2) ^ C m S and for all transitions r, 

p{ni) Apr Ap'{n2) Si{ni)AS2{n2) . 

Thus, if an SCS S is well-founded, no run can have a trail with limit set since 
it would violate the well-founded order. 

— Acceptance: Every nonaccepting SCS S [S ^ Tf has a fair exit transition 
or is well-founded. 

The Acceptance condition ensures that every computation of the system has at 
least one accepting trail in the diagram, that is, T(<S) C T(^). 

3.3 Property Satisfaction 

It remains to justify that T(C) C Lfpf which is done using the property labeling 
V of the diagram. Recall that the property labeling assigns to each node of the 
diagram a boolean combination of atomic assertions appearing in c/p, the property 
to be proven. 

^ A cut-set of an SCS S' is a set of edges C such that the removal of C from S results 
in a subgraph that is not strongly connected. 
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We say that a path tt : no, ni, ... is a property trail of an infinite sequence 
of states a : sq, Si, . . . if for alH > 0, Si \= n(i). An infinite sequence of states 
a is a property model of a diagram if it has an accepting property trail in the 
diagram. The set of property models of Q is written C^{Q). 

Given a property labeling n, a diagram Q defines a finite-state cj- automat on 
Aq by interpreting the atomic assertions of v as propositions. Similarly, the 
property ip defines a finite-state cj-automaton The models of both Ag and 
A(p are infinite sequences of states that assign truth values to these atomic 
assertions. 

The verification conditions to prove Property Satisfaction are 

51 for every node n G W, p{n) which can be shown deductively. 

52 the language inclusion C[Ag) C C[A^p) holds, which can be shown by stan- 
dard decidable cj-automata techniques. 

Condition SI proves C[Q) C from S2 the inclusion Cp{Q) C C{ip) follows, 

and by transitivity we have C[Q) C 

Example 3 

Returning to program simple of Figure 1, it is our goal to verify that each 
process i eventually sets a[i] to true. That is, we want to prove 

(p[i] : 1 0 a[i] for alH G [1..M] . 

However, we first prove the weaker property 

7/;[i] : 0 1 ^10 a[i] for alH G [1..M] , 

given in Example 2. 

Figure 2 shows the verification diagram Qi[i]^ parameterized by i. In the 
diagram, initial nodes are indicated by a sourceless edge going to the node. The 
diagram Q\ [i] represents a proof of 7/;[i], for all % G [1..M]. That is, for an arbitrary 
i G [1..M], T(simple) C C{gi[i]) C 

The acceptance condition, T = states that every trail of a 

computation must eventually end up in nodes ni or n^. To justify T(^i[i]) C 
T(7/;[i]), we have to show: SI the property labeling v is implied by the node 
labeling //, which is trivial in this case, and S2 the inclusion C[Agpi]) C C[Ag[i]) 
holds, which is obvious (note that 7/;[i] can be rewritten into 1 0 [i] VI 0 a[i] , 

to make the connection between the acceptance condition and the property more 
obvious). 

To justify T(simple) C T(^i[i]), we have to show Initiation, Consecution and 
Acceptance. Initiation and Consecution are easily shown. To show Acceptance, 
we have to show that the three nonaccepting SCSs are well-founded or have a fair 
exit transition. The SCS {no,ni} is shown to be well-founded with the ranking 
function 6 : M -\-l—j[i] defined on both nodes, and the SCSs {no} and {n 2 | have 
fair exit transitions £o[i] and ^2^ respectively. Therefore T(simple) C C(pg[i\) 
for i G [1..M]. 
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Fig. 2. Verification diagram [i], proving : 0 1 0 a[i] 



Note that we would not have been able to justify a nonaccepting {ni}, since 
transition £i[i] is not guaranteed to be enabled on ni, and therefore is not a fair 
exit transition. This led us to include {ni} in the acceptance condition, and thus 
prove the weaker property. ■ 

3.4 Previously Proven Properties 

Diagram verification enables the use of previously proven properties in several 
ways. As in verification by verification rules, invariants of the program can be ad- 
ded to the antecedents of all verification conditions. However, previously proven 
temporal properties can be used as well. 

Arbitrary temporal properties can be used to relax the Property Satisfaction 
condition as follows [BMS96]. Let <S |= . . . , <S |= (pn^ and let 0 be a diagram 

for S and p. Then, for Property Satisfaction, it suffices to show 

L{iPi)n...nL{ipn)nL{g) c £( 93 ) , 

instead of T(0) C £{p). To perform this check, additional propositions, appea- 
ring m pi . . . pn^ may have to be added to the diagram. 

Simple temporal properties can also be used to show that a diagram is S- 
valid. We say that an SCS S : {ni, . . . ,n/^} is terminating if 

0 1 -'(/i(ni) V ... V fi{n,k)) 

is <S-valid, that is every computation will always eventually leave the SCS. The 
Acceptance condition can now be relaxed to 

— Acceptance: Every nonaccepting SCS S (5^0^), has a fair exit transition, 
is well-founded, or is terminating. 
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4 Diagram Induction 

Proofs of progress properties of concurrent systems usually require the explicit 
identification of the transitions that ensure progress, called helpful transitions. 
For nonparameterized systems the number of distinct helpful transitions is bo- 
unded; ranking functions are used if these helpful transitions have to be taken an 
unknown number of times to achieve the goal. Therefore all helpful transitions 
can be explicitly represented in the diagram. 

The situation is different when the system is parameterized. In this case, 
achieving the goal may depend on an unbounded number of distinct helpful 
transitions, and thus they cannot be represented explicitly in the diagram. For 
example, in program simple, achieving 0 1 ->£i[i] may require a number of 
distinct transitions proportional to M. 

A possible solution in this case is to use existential diagrams^ which assert 
the existence of a process index for which the transition guarantees progress. 
Existence must then be shown as part of the proof of the verification conditions. 



Example 4 

In Example 3 we succeeded in proving 

^f[i] : 0 1 0 a[i] for all i G [1..M] . 

If we are able to prove 

x[i] : 0 1 for all i G [1..M] , 

we can conclude that the desired property, Lp[i] : 1 0 a[i], holds. 

Figure 3 shows the existential diagram ^ 2 ^, which could be used to prove 
x[i\- The diagram uses encapsulation conventions: a compound node labeled by 
an assertion p adds p as a conjunct to all of its subnodes, and an edge arriving 
at (or departing from) a compound node represents a set of edges that arrive at 
(or depart from) each of its subnodes. 

In the diagram sup[i) stands for the set of process indices for which process 
i is waiting, that is, those processes that have priority over i, 

sup[i) {r I r < i A -ia[r]} . 

The diagram states that as long as sup[i) ^ 0, there exists a process r at 
location £i or £ 2 ^ and, if at ^ 1 , the process is enabled. Proving Initiation for 
this diagram is straightforward. However, proving Consecution is much harder 
than for the usual diagrams, due to the existential quantifiers in the verification 
conditions. For example, using Hoare triple notation, the consecution condition 
for U 2 and transition £2 is 
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{ 3 r .^2 M A [i] A supii) 7 ^ 0 } 

^2 

ii[i] /\ sup{i) ^ $ /\3rlo[r] V ^ 

^i[i] A sup\i) 7 ^ 0 A 3r.£i[r] A (a[j[r]] V r < j[r]) V 

^ £i[i] A swp(i) 7^ 0 A 3r.£2[^] V > 

ii[i] A swp(i) = 0 V 

, “'^1 w ; 

The proof of this verification condition requires the instantiation of r with the 
process index of the process that is enabled if sup{i) 7 ^ 0 . 

The verification conditions to justify Acceptance are slightly different from 
those presented in this paper, to ensure that identity of transitions is preserved 
for fairness. The full definition of existential diagrams is given in [MP96]. ■ 

We now describe our new approach, showing how mathematical induction 
on a process index can be used to simplify the diagrams and the corresponding 
verification conditions needed to prove a progress property over a system 
S consisting of M processes. 
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The standard mathematical induction principle states that to prove P[i] 
for all natural numbers, it suffices to prove F[i] for an arbitrary i, assuming 
VA: < i.P[k] holds. This principle is directly applicable to the proof of temporal 
properties. However, the requirement of a fixed order on the process indices se- 
verely limits its applicability. Therefore we introduce a principle of mathematical 
induction for diagrams that allows a state-dependent order, that is, the truth 
value of A: -< i may change from state to state in a computation. 

Diagram induction requires an order function : S that 

maps each state s to a relation V(<s), such that for every s G V, the relation 
-<(s) is a partial order on [1..M] (that is, V(<s) is transitive and irreflexive) . The 
order function -< is incorporated in the inductive hypothesis as follows. 

Let (f[i] be the property to be proven for all i G [1..M], let -< be an or- 
der function, and let : {N ^ Nq^ be an cj-automaton for cp[i]. Then 

the automaton for the inductive hypothesis for (f[i] and -< is the cj-automaton 
W • {E'f E'fPfP''') obtained from as follows: 

= yv u {ne} 

=No\J {ne} 

E'^ = E U {(n,ne) | n G yV} U {(ne,n) | n G yV} 

E[n) = j/{n)[k/i] A k i for n E N 

A{ne) = -'{k -A i) 

= P u {S\uee S} 

Informally, this inductive hypothesis automaton [k] includes those sequences 
of states that satisfy (f[k] or that satisfy -i(A: -< i) infinitely often. 

Example 5 

Figure 4 shows the cj- automat on and inductive hypothesis automaton for the 
property (f[i] : 1 0 a[i] and the order function ■ 

Lemma 1. The set of models of the inductive hypothesis of (f[i] for k with order- 
function -< includes all models of(f[k]^ that is^ 

£{^[k])C£{A^^^[k]) 

Lemma 2. For every order function every sequence of states a G £{(p[k]s) 
has a trail in [A:] . 

With this definition of inductive hypothesis, we can now formulate the in- 
duction principle for diagrams: 



Diagram Induction Principle 

Consider a parameterized system <S, consisting of M processes, and a property 
(p[i] to be proven for alH G [l..yi4]. Assume there exists a diagram Q[i]^ parame- 
terized by i, and an order function -< such that the following conditions hold for 
all i G [l..yV4]: 
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(a) A^[i] (b) [k] 

Fig. 4. u;- Automaton and inductive hypothesis automaton for (^[t] : 1 0 a[t] 



11 -<(s) is a partial order on [1..M] for each <S-accessible state s; 

12 S satisfies the safety closure of that is, C[S) C C{Lp[i]s)] 

13 Q[i] is <S-valid, that is C[S) C C{Q[i])] 

14 there exists a set of indices K C [1..M] such that the product of Q\i] and the 

inductive hypothesis automata for k ^ is included in that 

is, 

i^{Q[i]) n Pi £ (ay ][/;]) j C for some C [1..M] . 

k^K / 

Then S |= aW? f^>r alH G [1..M]. 

Example 6 

Returning to program simple, we refine diagram Qi[i] shown in Figure 2 by 
splitting node n\ into two nodes: nn where £i[i] is guaranteed to be enabled 
and ni 2 where it is not enabled. The result is the verification diagram 
shown in Figure 5. The acceptance condition is .A = {{^ 3 } , {^ 12 }}- Initia- 
tion, Consecution and Acceptance are easily shown for this diagram, and thus 
T(simple) C T(^ 3 [i]). For the property 

^[i] : 0 1 a[j[i]] 0 a[i] for alH G [1..M] , 

Property Satisfaction is also easy to show, and thus T(Cs[i]) C T(7/;[i]), and 
therefore simple |= 7/;[i] for alH G [1..M]. 

However, we claim that by diagram induction diagram Q^[i] also represents 
a proof of the desired property 



cp[i] : 1 0 o\i] 



for alH G [1..M] , 
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Fig. 5. Verification diagram proving : 0 1 a[i] by diagram induction 



using the order function -< defined as the less-than relation (<) on [P.M] at all 
states. Premise II clearly holds. The safety closure of (/? is : true^ so premise 
12 holds trivially. The diagram was shown to be <S-valid earlier, thus satisfying 
premise 13. Finally, by taking K to be the singleton set {j[i]} the intersection 
of Qs[i] of Figure 5 and of Figure 4(b) is included in the property 

1 0 a[i] of Figure 4(a), since the inductive hypothesis for j[i] eliminates the 
SCS {ni 2 }. Therefore, by the diagram induction principle, we can conclude that 
SIMPLE^ 1 0 a[i] for all i e [1..M]. ■ 



Theorem 1 (Soundness). The Diagram Induction Principle is sound. 

Proof, Assume that the premises II through 14 hold, and suppose, for a cont- 
radiction, that Syt (p[ki] for some ki E [1..M]. Then there exists a computation 
a : So, 5i, . . . of <S such that a|=/ that is, a ^ £[(p[ki\). By premise 13 we 

haveT(<S) C £[Q[ki])^ and therefore a G T'{G[ki\). But then, by premise 14, there 
must exist some k 2 such that a ^ . By premise 12, a G £{(f[k 2 ]s)y 

and thus by Lemma 2 the sequence a has a trail in [^ 2 ]- From the fact that 

the trail is not accepting we can conclude that the trail eventually has to end up 
outside the added node ne, since all sets that include this node are accepting. 
All nodes in [^ 2 ] outside are labeled by A :2 A and therefore we have 

a 1= 1 0 {k 2 A A^i). In addition, by Lemma 1, we know that a\A (p[k 2 ]^ 

By the same reasoning we can conclude that there exists some k^ such that 
a ^ H ^ ^ (^3 A A^2)- Repeating this argument M times 
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we can conclude that 

(J 1= 1 0 (A^m+1 ^ ^m) a ... a 1 0 [k2 -< ki) , 



and therefore 



cr 1= 1 0 {{kM + i -<kM) A ... A (A ;2 A fci)) , 

and thus there exists a particular state in a such that 

' 5 ;^ 1 = kM^i ^ ku 

However, some process index k must appear twice in this sequence, violating 
premise II, that is a partial order, a contradiction. 



5 Example: Accessibility for bakery-m 



In this section we give an outline of the proof of accessibility for the paramete- 
rized system bakery-m, shown in Figure 6. This program, taken from [Pnu96], 
is based on Lamport’s bakery algorithm [Lam74,Lam77] for mutual exclusion. 



in M : integer where M > 0 

local choosing : array [ 1 ..M] of boolean where choosing = false 
number : array [ 1 ..M] of integer where number = 0 



local 7 , k^ t : integer where j = fc = 0 , t = 0 



■■■■ 



io: loop forever do 
"7i: noncritical 

£ 2 : chcjosing[i\ := true 
4: t := 0 

£ 4 : for k = 1 to M do 

£ 5 : t := max(t, nurnber\k\) 

Iq\ nurnber[i\ := t + 1 
£ 7 : chcjosing[i\ := false 
is: for 7 = 1 to M do 
rig: await ~^choosing[j] 

3 = 'I 

i\g: await V nurnber[j] = 0 
1_ V {number[i]^i) -< {number[j\^ j) 

ii\: critical 
ii 2 '. nurnber[i\ := 0 



Fig. 6 . Program bakery-m: molecular version 
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In the program, in statements £2 through £7, process i determines the ma- 
ximum ticket number currently held by any other process and assigns it, incre- 
mented by 1, to its own ticket number. In the coarser-grained atomic version of 
the program these statements are replaced by 

£^ : number [i] := 1 + max [number) . 

In statements £s through £10 process i can proceed only if every other process has 
a ticket number 0 (meaning it is not interested in entering the critical section), 
or its ticket number is higher than that of process i. In the atomic version these 
statements become 

£2 : await Vj : j ^ number[j] = 0 V number [i] < number [j]) . 

In [Lam77,Pnu96] it is shown that bakery-m guarantees mutual exclusion 
and accessibility, where in [Pnu96] accessibility is proven using existential dia- 
grams. Here we present an alternative proof of accessibility that uses diagram 
induction. 

The proof of accessibility, specified by 

acc[i] : 0 {at-£2[i] 1 at-£u[i]) for all i G [1..M] , 

is represented by four verification diagrams. The <S- validity of the diagrams relies 
on the following invariants, taken from [Pnu96], 



r 


choosing[i] ^ at-£^^^^^[i] 






h 


at-£r,. 


12 [i] < number [i] 
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( at -£4 


^l<k<M + l) A 
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at-£Q 


k = M + 1 
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00 


^ 1 < j < M + 1) A 


[atJg^io ^ 1 < j 


VI 


h 


at-£ii 


^j = M + l 






h 


at _£iq 


[i] A choosing[j[i]] 


superiority j[i]] 





where 

/ \ 

superior[i^r] : V u^T4,,,6[^] A [k[r] < i V number [i] <t[r]) 

y V at -£^ ^ ^ ^i2[r] A [number [i]^i) -< [number[r]^r) J 

The first diagram, ^4[i], shown in Figure 7, proves that accessibility holds 
provided the program will always leave locations £q and £iq: 

^4[i] : (0 1 -i£g[i]A0 1 ^ 0 [£2[i] ^ 1 £ii[i]) 

The diagram is a straightforward representation of the path that leads from 
location £2 to £n. Initiation and Consecution clearly hold for this diagram. To 
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justify the acceptance condition, T = U {5 | ng G S'}, we 

have to show that all nonaccepting SCSs have a fair exit or are well-founded. It 
is easy to see that all nonaccepting single-node SCSs have a fair exit transition. 
The two remaining SCSs, {n 2 , ns}, and {ne, nr, ng} are shown to be well-founded 
using the ranking functions ^(n 2 ) = ^(ns) = M + 1 — fc[i], and 4(ne) = ^(nr) = 
6 {ns) = M 1 — j[i]^ respectively. The well-foundedness of M T 1 — k[i] and 
M 1 — j[i] relies on the invariants I 2 and I 4 respectively. 

It remains to show that the system cannot forever stay in nodes nr and ng, 
that is, 

'ipili] : 0 1 -latTgfi] for all i G [1..M] 

'ip 2 [i\ - 0 1 -latTiofi] for alH G [1..M] 

Two diagrams, not included in this paper, prove that for all i G [1..M] 

: 0 1 ->choosing[j[i]] ^01 ->at_£g[i] 



and 

- 0 1 -ichoosing[i] 

respectively, from which can be concluded for all i G [1..M]. 

The diagram Qr[i]^ shown in Figure 8, represents a proof of ^ 2 ^ using dia- 
gram induction. Informally, the nodes no,...,n 5 represent the situation that 
process j[i] has priority over process i to enter its critical section. In node ne, 
process i has priority over j[i] and on this node transition ^ 10 [^] is guaranteed to 
be enabled, leading to the goal node nr. 

Initiation is easily shown for this diagram. Consecution requires several of 
the invariants. In particular the verification condition 

{ /i(ne) } 4 { V /x(n,7) } 

represents a crucial part of the proof, namely that once process j[i] has left its 
critical section, it will not return with a lower ticket number than process i while 
process i is at £iq. 

To justify the acceptance condition T = {{u^s}} U {S'jnrGS'}, all single- 
node, nonaccepting SCS’s, except {n 2 } are shown to have fair exit transitions; 
{n 2 } is shown to be terminating by and the SCS {ni,n 2 ,n 3 } is esta- 

blished to be well-founded, with ranking functions ^(ni) = 6 [ 712 ) = S[ns) = 
M + 1 — whose well-foundedness relies on the invariant I 4 , 

Without using induction the diagram represents a proof of 

(fr[i] ' 0 1 ^01 ^£io[i] . 

However, if we apply the diagram induction principle with order function 
defined by 

i k iff {number[i]^i) < [number [k]^k) 

at each state, and take K to be the singleton set {j[i]}, the offending SCS {ns} 
is eliminated and the diagram represents a proof of ^ 2 ^ for ull i G [1..M], as 
desired. This completes the proof of accessibility for bakery- M. 
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Extended Abstract 

Although testing is the most widely used technique to control the quality of 
software systems, it is a topic that, until relatively recently, has received scant 
attention from the computer research community. Although some pioneering 
work was already done a considerable time ago [Cho78,GG83,How78,Mye79], the 
testing of software systems has never become a mainstream activity of scientific 
research. The reasons that are given to explain this situation usually include 
arguments to the effect that testing as a technique is inferior to verification - 
testing can show only the presence of errors^ not their absence - and that we 
should therefore concentrate on developing theory and tools for the latter. It has 
also been frequently said that testing is by its very nature a non-formal activity, 
where formal methods and related tools are at best of little use. 

The first argument is incorrect in the sense that it gives an incomplete pic- 
ture of the situation. Testing is inferior to verification if the verification model 
can be assumed to be correct and if its complexity can be handled correctly by 
the person and or tool involved in the verification task. If these conditions are 
not fulfilled, which is frequently the case, then testing is often the only available 
technique to increase the confidence in the correctness of a system. In this talk 
we will show that the second argument is flawed as well. It is based on the iden- 
tification of testing with robustness testing, where it is precisely the objective to 
And out how the system behaves under unspecified circumstances. This excludes 
the important activity of conformance testing^ which tries to test the extent to 
which system behaviour conforms to its specification. It is precisely in this area 
where formal methods and tools can help to derive tests systematically from spe- 
cifications, which is a great improvement over laborious, error-prone and costly 
manual test derivation. 

In our talk we show how the process algebraic testing theory due to De 
Nicola and Hennessy [DNH84,DeN87], originally conceived out of semantic con- 
siderations, may be used to obtain principles for test derivation. We will give an 
overview of the evolution of these ideas over the past ten years or so, starting 
with the conformance testing theory of simple synchronously communicating 
reactive systems [Bri88,Lan90] and leading to realistic systems that involve so- 
phisticated asynchronous message passing mechanisms [Tre96,HT97]. Written 
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accounts can be found in [BHT97,He98]. We discuss how such ideas have been 
used to obtain modern test derivation tools, such as TVEDA and TGV [Pha94, 
CGPT96,FJJV96], and the tool set that is currently being developed in the 
Gote-de-Resyste project [STW96]. The advantage of a test theory that is ba- 
sed on well-established process algebraic theory is that in principle there exists 
a clear link between testing and verification, which allows the areas to share 
ideas and algorithms [FJJV96,VT98]. Time allowing, we look at some of the 
methodological differences and commonalities between model checking techni- 
ques and testing, one of the differences being that of state space coverage^ and 
an important commonality that of test property selection. 

In recent years the research into the use of formal methods and tools for 
testing reactive systems has seen a considerable growth. An overview of different 
approaches and school of thought can be found in [BPS98], reporting on the first 
ever Dagstuhl seminar devoted to testing. The formal treatment of conformance 
testing based on process algebra and/or concurrency theory is certainly not the 
only viable approach. An important school of thought is the FSM-testing theory 
grown out of the seminal work of Ghow [Gho78] , of which a good overview is given 
in [LY96]. Another interesting formal approach to testing is based on abstract 
data type theory [Gau95,BGM91]. 



References 



[BGM91] 

[Bri88] 

[BHT97] 

[BPS98] 

[Cho78] 

[CGPT96] 

[DeN87] 

[DNH84] 



G. Bernot, M.-C. Gaudel, and B. Marre. Software testing based on formal 
specifications: a theory and a tool. Software Engineering Journal^ 1991 
(November): 387-405. 

E. Brinksma. A theory for the derivation of tests. In: S. Aggarwal and K. 
Sabnani, editors, Protocol Specification, Testing, and Verification VIII, 
63-74, North-Holland, 1988. 

E. Brinksma, L. Heerink, and J. Tretmans. Developments in testing tran- 
sition systems. In: M. Kim, S. Kang, and K. Hong, editors, Tenth Int. 
Workshop on Testing of Communicating Systems, 143-166, Chapman & 
Hall, 1997. 

E. Brinksma, J. Peleska, and M. Siegel, editors. Test Automation for 
Reactive Systems - Theory and Practice, Dagstuhl Seminar report 223 
(98361), SchloBDagstuhl, Germany, 1998. 

T.S. Chow. Testing software design modeled by finite-state systems. 
IEEE Transactions on Software Engineering, 4(3): 178-187, 1978. 

M. Clatin, R. Groz, M. Phalippou, and R. Thummel. Two approaches 
linking test generation with verification techniques. In: A. Cavalli and S. 
Budkowski, editors. Eighth Int. Workshop on Testing of Communicating 
Systems. Chapman & Hall, 1996. 

R. De Nicola. Extensional equivalences for transition systems. Acta. In- 
formatica, 24:211-237, 1987. 

R. De Nicola and M.C.B. Hennessy. Testing equivalences for processes. 
Theoretical Computer Science, 34:83-133, 1984. 




46 



E. Brinksma 



[FJJV96] 

[Gau95] 

[GG83] 

[He98] 

[HT97] 

[How78] 

[Mye79] 

[Lan90] 

[LY96] 

[Pha94] 

[STW96] 

[Tre96] 

[VT98] 



J.-G. Fernandez, G. Jard, T. Jeron, and G. Viho. Using on-the-fly verifi- 
cation techniques for the generation of test suites. In: R. Alur and T.A. 
Henzinger, editors, Computer Aided Verification CAV^96. LNGS 1102, 
Springer- Verlag, 1996. 

M.-G. Gaudel. Testing can be formal, too. In: P.D. Mosses, M. Nielsen, 
and M.I. Schwarzbach, editors, TAPSOFTCJ5: Theory and Practice of 
Software Development^ 82-96, LNGS 915, Springer -Verlag, 1995. 

J.B. Goodenough and S.L. Gerhardt. Toward a theory of test data sel- 
ection. IEEE Transactions on Software Engineering^ 9(2), 1983. 

L. Heerink. Ins and Outs in Refusal Testing. Docoral dissertation. Uni- 
versity of Twente, The Netherlands, 1998. 

L. Heerink and J. Tretmans. Refusal Testing for classes of transition 
systems with inputs and outputs. In: T. Mizuno, N. Shiratori, T. Higas- 
hino, and A Togashi, editors, Eormal Description Technigues and Pro- 
tocol Specification, Testing, and Verification FORTE X/PSTV XVII, 
23-38, Ghapman & Hall, 1997. 

W.E. How den. Algebraic program testing. Acta Informatica, 10:53-66, 
1978. 

G.J. Myers. The Art of Software Testing. John Wiley & Sons Inc., 1979. 
R. Langerak. A testing theory for LOTOS using deadlock detection. In: 
E. Brinksma, G. Scollo, and G.A. Vissers, editors, Proctocol Specification, 
Testing, and Verification IX, 87-98, North- Holland, 1990. 

D. Lee and M. Yannakakis. Principles and methods for testing finite state 
machines. Proceedings of the IEEE. August 1996. 

M. Phalippou. Relations d implementation et hypotheses de test sur des 
automates a entrees et sorties. PhD Thesis, Universite de Bordeaux I, 
France, 1994. 

Dutch Technology Foundation STW. Cote-de-Resyste - GOnfor- 
mance TEsting of REactive SYSTEms, project TIE. 41 11. Univer- 
sity of Twente, Eindhoven University of Technology, Philips Re- 
search, KPN Research, Utrecht, The Netherlands, 1996. URL: 

http : //f mt . cs . utwente . nl/pro j ects/CdR-html/ . 

J. Tretmans. Test Generation with inputs, outputs, and quiescence. Soft- 
ware - Concepts and Tools, 17(3): 103-120, 1996. 

R.G. de Vries and J. Tretmans. On-the-fly conformance testing using 
SPIN. In: G. Holzmann, E. Najm, and A. Serhrouchni, editors. Fourth 
Workshop on Automata Theoretic Verification with the SPIN Model 
Checker, ENST 98 S 002, 115-128, Paris, France, 1998. 




Proof of Correctness of a Processor 
with Reorder Buffer Using the 
Completion Functions Approach * 



Ravi Hosabettu^, Mandayam Srivas^, and Ganesh Gopalakrishnan^ 



^ Department of Computer Science, University of Utah, Salt Lake City, UT 84112, 

hosabett ,ganesh@ cs . ut ah . edu 

^ Computer Science Laboratory, SRI International, Menlo Park, CA 94025, 

srivas@csl.sri.com 



Abstract. The Completion Functions Approach was proposed in [HSG98] 
as a systematic way to decompose the proof of correctness of pipelined 
microprocessors. The central idea is to construct the abstraction function 
using completion functions, one per unfinished instruction, each of which 
specifies the effect (on the observables) of completing the instruction. In 
this paper, we show that this “instruction-centric” view of the completion 
functions approach leads to an elegant decomposition of the proof for an 
out-of-order execution processor with a reorder buffer. The proof does not 
involve the construction of an explicit intermediate abstraction, makes 
heavy use of strategies based on decision procedures and rewriting, and 
addresses both safety and liveness issues with a clean separation between 
them. 

1 Introduction 

For formal verification to be successful in practice not only is it important to 
raise the level of automation provided but is also essential to develop methodolo- 
gies that scale verification to large state-of-the-art designs. One of the reasons for 
the relative popularity of model checking in industry is that it is automatic when 
readily applicable. A technology originating from the theorem proving domain 
that can potentially provide a similarly high degree of automation is one that 
makes heavy use of decision procedures for the combined theory of boolean ex- 
pressions with uninterpreted functions and linear arithmetic [CRSS94,BDL96]. 
Just as model checking suffers from a state-explosion problem, a verification 
strategy based on decision procedures suffers from a “case-explosion” problem. 
That is, when applied naively, the sizes of the terms generated and the number of 
examined cases during validity checking explodes. Just as compositional model 
checking provides a way of decomposing the overall proof and reducing the ef- 
fort for an individual model checker run, a practical methodology for decision 
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procedure-centered verification must prescribe a systematic way to decompose 
the correctness assertion into smaller problems that the decision procedures can 
handle. 

In [HSG98], we proposed such a methodology for pipelined processor verifi- 
cation called the Completion Functions Approach. The central idea behind this 
approach is to define the abstraction function as a composition of a sequence 
of completion functions, one for every unfinished instruction, in their program 
order. A completion function specifies how a partially executed instruction is to 
be completed in an atomic fashion, that is, desired effect on the observables of 
completing that instruction. Given such a definition of the abstraction function 
in terms of completion functions, the methodology prescribes a way of orga- 
nizing the verification into proving a hierarchy of verification conditions. The 
methodology has the following attributes: 

• The verification proceeds incrementally making debugging and error tracing 
easier. 

• The verification conditions and most of the supporting lemmas needed to 
support the incremental methodology can be generated systematically. 

• Every generated verification condition and lemma can be proved, often au- 
tomatically, using a strategy based on decision procedures and rewriting. 

In summary, the completion functions approach strikes a balance between full 
automation that (if at all possible) can potentially overwhelm the decision proce- 
dures, and a potentially tedious manual proof. This methodology is implemented 
using PVS [ORSvH95] and was applied (in [HSG98]) to three processor examples: 
DLX [HP90], dual-issue DLX, and a processor that exhibited limited out-of-order 
execution capability. An attribute common to all these processors was that the 
maximum number of instructions pending at any time in the pipeline was small 
and fixed, which made the completion functions approach readily amenable for 
these examples. It was an open question if the approach would be practical, even 
if applicable, to verify a truly out-of-order execution processor with a reorder 
buffer. Such a processor can have scores of pending instructions in the reorder 
buffer potentially making the task of defining completion functions tedious and 
possibly exploding the number of generated verification conditions. 

In this paper, we demonstrate that the completion functions approach is 
well-suited to the verification of out-of-order execution processors by verifying 
an example processor (a simplified model, based on the P6 design) with a reorder 
buffer and generic execution units and without any data size bounds. We observe 
that regardless of how many instructions are pending in the reorder buffer, the 
instructions can only be in one of four distinct states. We exploit this fact to 
provide a single compact parameterized completion function applicable to all 
the pending instructions in the buffer. The abstraction function is then defined 
as a simple recursive function that completes all the pending instructions in 
the order in which they are stored in the reorder buffer. The proof is organized 
as a single parameterized verification condition, which is proved using a simple 
induction on the number of instructions in the buffer. The different cases of the 
induction are generated on the basis of how an instruction makes a transition 
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from its present state to its next state. We make heavy use of an automatic 
case-analysis strategy and certain other strategies based on decision procedures 
and rewriting in discharging these different cases. This same observation about 
instruction state transitions is used in providing a proof of liveness too. 

Related work: The problem of verifying the control logic of out-of-order 
execution processors has received considerable attention in the last couple of 
years using both theorem proving and model checking approaches. The following 
yardsticks can be used to evaluate the various approaches: (1) the amount and 
complexity of information required from the user, (2) the complexity of the 
manual steps of the methodology (3) the level of automation with which the 
obligations generated by the methodology can be verified. 

Two theorem- proving based verifications of a similar design are described 
in [JSD98] and [PA98]. The idea in [JSD98] is to first show that for every out- 
of-order execution sequence that contains as many as n unretired instructions at 
any time there exists an “equivalent” (max-1) execution containing at most 1 un- 
retired instruction by constructing a suitable controller schedule, ft then shows 
the equivalence between a max-1 execution and the ISA level. The induction 
required in the first step, which was not mechanized, is very complicated. The 
verifier needs a much deeper insight into the control logic to exhibit a control 
schedule and to discharge the generated obligations in the first step than that 
is needed for constructing the completion functions and discharging the gener- 
ated verification conditions. Whereas our verification makes no assumption on 
the time taken by the execution units, the mechanized part of their first step 
bounds the execution time. The proofs mix safety and liveness issues and the 
verification of liveness issues is not addressed. And the complexity of the reach- 
ability invariants needed in their approach and the effort required to discharge 
them is not clear; few details are provided in the paper. 

The verification in [PA98] is based on refinement by using “synchronization 
on instruction retirement” to reduce the complexity of the refinement relations 
to be proved. Although they do not need any flushing mechanism, there is no 
systematic method to generate the invariants and obligations needed and hence 
their mechanization is not as automatic as ours. And they do not address liveness 
issues needed to complete the proof. 

In [SH98], verification of a processor model with a reorder buffer, exceptions, 
and speculative execution is carried out. Their approach relies on constructing 
an explicit intermediate abstraction (called MAETT) and expressing invariant 
properties over this. Our approach avoids the construction of an intermediate 
abstraction and hence requires significantly less manual effort. 

fn [McM98], McMillan uses compositional model checking and aggressive 
symmetry reductions to manually decompose the proof of a processor imple- 
menting Tomasulo’s algorithm (without a reorder buffer) into smaller correct- 
ness obligations via refinement maps. Setting up the refinement maps requires 
information similar to that provided by the completion functions in addition to 
some details of the design. An advantage of model checking is that it does not 
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need any reachability invariants to check the refinement maps although the user 
has to give hints about the environment assumptions to be used. 

The rest of the paper is organized as follows: In Section 2, we describe our 
processor model. Section 3 describes our correctness criteria and provides a brief 
overview of our approach applied to examples mentioned earlier in [HSG98]. This 
is followed by the proof of correctness in Section 4 and finally the conclusions. 

2 Processor Model 



Dispatch 

Buffer 




Fig. 1. The block diagram model of our implementation 

The implementation model of an out-of-order execution processor that we 
consider in this paper is shown in Figure 1. A reorder buffer is used to main- 
tain the program order of the instructions so that they can be committed in 
that order to respect the ISA semantics. (rb_end points to the earliest issued 
instruction and rb -front points to the first available free slot in the buffer). 
A register translation table (RTT) is maintained to provide the identity of the 
latest pending instruction writing a particular register. The model has a dis- 
patch buffer (of size z; the dispatch buffer entries are also called “reservation 
stations” in other literature) where instructions wait before being sent to the 
execution units. There are m execution units represented by an uninterpreted 
function (z and m are parameters to our implementation model). A scheduler 
controls the movement of the instructions through the execution pipeline (such 
as being dispatched, executed etc.) and its behavior is modeled by axioms (to al- 
low us to concentrate on the processor “core”). Instructions are fetched from the 
instruction memory (using a program counter which then is incremented); and 
the implementation also takes a no_op input, which suppresses an instruction 
fetch when asserted. 

An instruction is issued by allocating an entry for it at the front of the re- 
order buffer and a free entry in the dispatch buffer (New_slot). No instruction is 
issued if the dispatch buffer is full or if no_op is asserted. The RTT entry corre- 
sponding to the destination of the instruction is updated to reflect the fact that 









Proof of Correctness of a Processor with Reorder Buffer 



51 



the instruction being issued is the latest one to write that register. If the source 
operand is not being written by a previously issued pending instruction (checked 
using the RTT) then its value is obtained from the register file, otherwise the re- 
order buffer index of the instruction providing the source operand is maintained 
(in the dispatch buffer entry). Issued instructions wait in the dispatch buffer for 
their source operand to become ready, monitoring the execution units if they 
produce the value they are waiting for. An instruction can be dispatched when 
its source operand is ready and a free execution unit is available. Dispatch? and 
Dispatch_slot outputs from the scheduler (each a m-wide vector) determine 
whether or not to dispatch an instruction to a particular execution unit and the 
dispatch buffer entry from where to dispatch. As soon as an instruction is dis- 
patched, its dispatch buffer entry is freed. Dispatched instructions get executed 
after a non-deterministic amount of time as determined by the scheduler output 
Execute?. The result of executed instructions are written hack to their respective 
reorder buffer entries as well as forwarded to those instructions waiting for this 
result (at a time determined by the Write_back? output of the scheduler). If 
the instruction at the end of the reorder buffer has written back its result, then 
that instruction can be retired by copying the result value to the register file (at 
a time determined by the Retire? output of the scheduler). Also, if the RTT 
entry for the destination of the instruction being retired is pointing to the end, 
then that entry is updated to reflect the fact that value of that register is in the 
register file. 

Our simplified model does not have memory or branch instructions and does 
not handle exceptions. For simplicity, multiple instruction issue or retirement is 
not allowed in a single cycle (but multiple instructions can be simultaneously 
dispatched or written back). Also, the reorder buffer is implemented as an un- 
bounded buffer as opposed to a circular queue. ^ 

At the specification level, the state is represented by a register file, a program 
counter and an instruction memory. Instructions are fetched from the instruc- 
tion memory, executed, result written back to the register file and the program 
counter incremented in one clock cycle. 

3 Our Correctness Criteria 

Intuitively, a pipelined processor is correct if the behavior of the processor start- 
ing in a flushed state (i.e., no partially executed instructions), executing a pro- 
gram and terminating in a flushed state is emulated by an ISA level specifica- 
tion machine whose starting and terminating states are in direct correspondence 
through projection. This criterion is shown in Figure 2(a) where I_step is the 
implementation transition function, A_step is the specification transition func- 
tion, and projection extracts those implementation state components visible 
to the specification (i.e., observables). This criterion can be proved by an easy 
induction on n once the commutative diagram condition shown in Figure 2(b) 

^ Using a bounded reorder buffer will not complicate the methodology but makes 
setting up the induction more involved. 
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is proved on a single implementation machine transition (and a certain other 
condition discussed in the next paragraph holds). 



flushed 

impLstate 



fate . . 

I — I projection ^ q 



n I_step 



m A_step 



Y V 

^ projection ^ O 

flushed 
impLstate 




Fig. 2. Pipelined microprocessor correctness criteria 

The criterion in Figure 2(b) states that if the implementation machine starts 
in an arbitrary reachable state impl .state and the specification machine starts 
in a corresponding specification state (given by an abstraction function ABS), 
then after executing a transition their new states correspond. Further ABS must 
be chosen so that for all flushed states fs the projection condition ABS(fs) = 
projection(f s) holds. The commutative diagram uses a modified transition 
function A.stepG which denotes zero or more applications of A_step, because 
an implementation transition from an arbitrary state might correspond to exe- 
cuting in the specification machine zero instruction ( e.g,^ if the implementation 
machine stalls without fetching an instruction) or more than one instruction 
(e.^., if multiple instructions are fetched in a cycle). The number of instructions 
executed by the specification machine is provided by a user-defined synchroniza- 
tion function on implementation states. One of the crucial proof obligations is to 
show that this function does not always return zero (NoJndefinite^stutterohligSi- 
tion). One also needs to prove that the implementation machine will eventually 
reach a flushed state if no more instructions are inserted into the machine, to 
make sure that the correctness criterion in Figure 2(a) is not vacuous (Even- 
tuaLflsh obligation). In addition, the user may need to discover invariants to 
restrict the set of impl .state considered in the proof of Figure 2(b) and prove 
that it is closed under I .step. 

The completion functions approach suggests a way of constructing the ab- 
straction function. We define a completion function for every unfinished instruc- 
tion in the processor that directly specifies the intended effect of completing that 
instruction. The abstraction function is defined as a composition of these com- 
pletion functions in program order. In the examples in [HSG98], the program 
order was determined from the structure of the pipeline. This construction of 
the abstraction function decomposed the proof into proving a series of verifi- 
cation conditions, each of which captured the effect of completing instructions 
one at a time and that were reused in the proof of the subsequent verification 
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conditions. Since there were a fixed (and small) number of instructions pending 
in the pipeline, this scheme worked well and the proof was easily accomplished. 

However, the number of instructions is unbounded in the present example 
and the above scheme does not work. But we observe that a pending instruction 
in the processor can only be in four possible states and provide a parameterized 
completion function using this fact. The program order is easily determined since 
the reorder buffer stores it. And we generate a single parameterized verification 
condition which is proved by an induction on the number of pending instruc- 
tions in the reorder buffer, where the induction hypothesis captures the effect of 
completing all the earlier instructions. 

4 Proof of Correctness 

We introduce some notations which will be used throughout this section: q rep- 
resents the implementation state, s the scheduler output, i the processor input, 
rf (q) the register file contents in state q and I_step (q, s , i) the “next state” 
after an implementation transition. Also, we identify an instruction in the pro- 
cessor by its reorder buffer entry index (i.e., instruction rbi means instruction 
at index rbi). The complete PVS specifications and the proof scripts can be 
found at [Hos99]. 

4.1 Specifying the completion functions 

An instruction in the processor can be in one of the following four possible 
states inside the processor — issued, dispatched, executed or written back. (A 
retired instruction is no longer present in the processor). We formulate predicates 
describing an instruction in each of these states and identify how to complete 
such an instruction. To facilitate this formulation, we add two auxiliary variables 
to a reorder buffer entry. ^ The first one maintains the index of the dispatch 
buffer entry allocated to the instruction while it is waiting to be dispatched. The 
second one maintains the execution unit index where the instruction executes. 
The definition of the completion function is shown in |T] . 

% state_I : impl . state type . rbindex : reorder buffer index type. | 1 

Complete_instr(q:state_I,rbi:rbindex) :state_I = 

IF written_back_predicate (q,rbi) THEN Action_written_back(q,rbi) 

ELSIF executed_predicate (q,rbi) THEN Action_executed(q,rbi) 

ELSIF dispatched_predicate (q,rbi) THEN Action_dispatched(q,rbi) 

ELSIF issued_predicate (q,rbi) THEN Action_issued(q,rbi) 

ELSE q ENDIF 

fn this implementation, when the instruction is in the written back state, the 
result value as well as the destination register of the instruction are in its re- 
order buffer entry. So Action_written_back above completes this instruction by 
updating the register file by writing the result value to the destination register. 
An instruction in the issued state is completed (Act ion_is sued) by reading the 

^ The auxiliary variables are for specification purposes only. The third auxiliary vari- 
able we needed maintained the identity of the source register for a given instruction. 
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value of the source register from the register file, (this relies on the fact that 
the completion functions will be composed in the program order in defining the 
abstraction function; so q for a given instruction will be that state where the 
instructions ahead of it are completed) computing the result value depending on 
the instruction operation and then writing this value to the destination register. 
Similarly Action_executed and Action_dispatched are specified. None of these 
“actions” affect the program counter or the instruction memory. The completion 
function definition is very compact, taking only 15 lines of PVS code. 



4.2 Constructing the abstraction function 



The abstraction function is constructed by flushing the reorder buffer, that is, 
by completing all the unfinished instructions in the reorder buffer. We define a 
recursive function Complete_till to complete instructions till a given reorder 
buffer index as shown in 2 and then construct the abstraction function by 
instantiating this definition with the index of the latest instruction in the reorder 
buffer (i.e., rb_front (q)-l). The synchronization function returns zero if no_op 
input is asserted or there is no free dispatch buffer entry (hence no instruction 
is issued) otherwise returns one. 



% If the given instr. index is less than the end pointer of the | 2 

% reorder buffer, do nothing. Else complete that instr. in a state 
% where all the previous instructions are completed. 

Complete_till(q: state_I,rbi irbindex) : RECURSIVE state_I = 

IF rbi < rb_end(q) THEN q 

ELSE Complete_instr (Complete_till(q,rbi-l) ,rbi) ENDIF 
MEASURE rbi 

% state_A is the specification state type. 

ABS(q: state_I) :state_A = projection(Complete_till(q,rb_front (q)-l) ) 



4.3 Proof decomposition 

We first prove a single parameterized verification condition that captures the 
effect of completing all the instructions in the reorder buffer and then use it in the 
proof of the commutative diagram. We decompose the proof of this verification 
condition based on how an instruction makes a transition from its present state 
to its next state. 

Consider an arbitrary instruction rbi. We claim that the register file contents 
will be the same whether the instructions till rbi are completed in state q or 
in I_step (q, s , i) . This is shown as lemma samexrf in [^. We prove this by 
induction on rbi. 

7o The single parametrized verification condition. | 3 

% valid_rb_entry? predicate tests if rbi is within reorder buffer bounds. 
same_rf : LEMMA 

FORALL(rbi:rbindex) : valid_rb_entry?(q,rbi) IMPLIES 
rf (Complete_till(q,rbi)) = rf (Complete_till(I_step(q, s , i) ,rbi) ) 

We generate the different cases of the induction argument (as detailed later) 
based on how an instruction makes a transition from its present state to its next 
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state. This is shown in Figure 3 where we have identified the conditions under 
which an instruction changes its state. For example, we identify the predicate 
Dispatch.trans? (q, s , i ,rbi) that defines the condition under which the in- 
struction rbi goes from issued state to dispatched state. In this implementation, 
this predicate is true when there is an execution unit for which Dispatch? out- 
put from the scheduler is true and the Dispatch_slot output is equal to the 
dispatch buffer entry index assigned to rbi. Similarly other “trans” predicates 
are defined. 



NOT Dispatch_trans? NOT Execute_trans? NOT Writeback_trans? NOT Retire_trans? 



Entr}(T 



I 



Dispatch_trans? 



Execute_trans? 



Writeback_trans ? 






Retire_trans? 



^Exit 



Fig. 3. The various states au iustructiou cau be iu aud trausitious betweeu theru, I: 
issued, D: dispatched, E: executed, W: writteu back. 

Having defined these predicates, we prove that they indeed cause instructions 
to take the transitions shown. Consider a valid instruction rbi in the issued state, 
that is, issued-predicate (q, rbi) holds. If Dispatch.trans? (q, s , i ,rbi) is 
true, then we show that after an implementation transition, rbi will be in the 
dispatched state (i.e., dispatched_predicate (I_step (q, s , i) , rbi) is true) and 
remains valid. This is shown as a lemma in [^. If Dispatch.trans? (q, s , i ,rbi) 
is false, we show that rbi remains in the issued state in I.step (q, s , i) and 
remains valid. There are five other similar lemmas for the other transitions. 
In the eighth case, that is, rbi in the written back state being retired, the 
instruction will be invalid (out of reorder buffer bounds) in I_step (q, s , i) . 

issue_to_dispatch: LEMMA | 4 

FORALL(rbi:rbindex) : (valid_rb_entry? (q,rbi) AND 
issued_predicate (q,rbi) AND Dispatch_trans?(q,s,i,rbi)) IMPLIES 
(dispatched_predicate (I_step(q, s , i) ,rbi) AND 
valid_rb_entry? (I_step(q, s , i) ,rbi) ) 

Now we come back to details of the induction argument for same_rf lemma. 
We do a case analysis on the possible state rbi is in and whether or not, it makes 
a transition to its next state. Assume the instruction rbi is in the issued state. 
We prove the induction claim in the two cases — Dispatch.trans? (q, s , i ,rbi) 
is true or false — separately. (The proof obligation for the first case is shown in 



proof decomposes into eight very similar proof obligations. 

7o One of the eight cases in the induction argument. | 5 

issue_to_dispatch_induction : LEMMA 

FORALL(rbi:rbindex) : (valid_rb_entry? (q,rbi) AND 
issued_predicate (q,rbi) AND Dispatch_trans?(q,s,i,rbi) AND 
Induct ion_hypothes is (q, s , i , rbi-1) ) IMPLIES 
rf (Complete_till(q,rbi)) = rf (Complete_till(I_step(q, s , i) ,rbi) ) 

We now sketch the proof of issue_to_dispatch_induction lemma. (We re- 
fer to the goal that we are proving — rf (...) = rf ( . . . ) — as the consequent.) 
We expand the definition of the completion function corresponding to 



.) We have similar proof obligations for rbi being in other states. In all, the 
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rbi on both sides of the consequent (after unrolling the recursive definition of 
Complete .till once). It follows from i s sue _to .dispatch lemma that since rbi 
is in issued state in q, it is in dispatched state in I.step (q, s , i) . After rewrit- 
ing and simplifications in PVS, the left hand side of the consequent simplifies 
to rf (Action.issued(Complete.till (q, rbi-1) ,rbi) ) ^ and the right hand 
side to rf (Action.dispatched(Complete.till (I.step (q, s , i) , rbi-1) ,rbi) ) 
(Illustrated in Figure 4). Proof now proceeds by expanding the definitions of 
Action.issued and Action.dispatched, using the necessary invariants and sim- 
plifying. We use many simple PVS strategies during the proof; in particular we 
use (apply (then* (repeat (lift-if)) (bddsimp) (ground) (assert))) to 
do the simplifications by automatic case-analysis. Observe that when we expand 
Action.dispatched, all implementation variables take their “next” values. Also 
on the left hand side of the consequent, term rf (Complete.till (q, rbi-1) ) ap- 
pears and on right hand side, term rf (Complete.till (I.step (q, s , i) , rbi-1) ) 
appears and these are same by the induction hypothesis. 




Fig. 4. The reorder buffer and the state of the instructions in it before and after an 
implementation transition (one possible configuration, empty slot means no instruction 
present). Gompleting a particular instruction reduces to performing the action shown. 

We now instantiate the lemma samejrf above with the index of the latest 
instruction in the processor (i.e., rb.front (q)-l) and use it in the proof of the 
commutative diagram for register file. Assume that no instruction is issued in the 
current cycle, that is, the synchronization function returns zero. Then rb.front 
remains unchanged after an implementation transition and the proof is trivial. If 
indeed a new instruction is issued, then it will be at index rb.front(q) and will 
be in issued state, so proving the commutative diagram reduces to establishing 
that completing the new instruction (as per Act ion.is sued) has the same effect 

^Observe that issued.predicate (Complete.till (q, rbi-1) , rbi) if and only if 
issued.predicate (q,rbi) . This is because the completion functions affect only the 
register file (observables in general) and issued.predicate depends only on the 
non-observables . 
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on the register file as executing a specification machine transition. This proof is 
similar to the proof of the lemma described above. The commutative diagram 
proofs for pc and the instruction memory are trivial and are omitted. 
Correctness of feedback logic: The proof presented above requires that the 
correctness of the feedback logic be captured in the form of a lemma as shown in 



its value is equal to the value read from the register file after all the instruc- 
tions ahead of it are completed. When an instruction rbi in the issued state 
is being dispatched, it uses src_value (q,rbi) as the source operand but the 
Act ion_is sued that is used to complete it reads the source value from the reg- 
ister file (see the description of Action.issued in Section 4.1) and this lemma 
establishes that these two values are the same. The proof of this lemma relies 

on an invariant described later. 

7o select reads from the register file. src_ready? , src_value and | 6 
% src_reg have their obvious definitions. 

Feedback_logic_correct : LEMMA 

FORALL(rbi:rbindex) : (valid_rb_entry? (q,rbi) AND 
issued_predicate (q,rbi) AND src_ready? (q,rbi) ) IMPLIES 
src_value(q,rbi) = select (rf (Complete_tilI(q,rbi-l) ) , src_reg(q,rbi) ) 
Invariants needed: We now provide a classification of the invariants needed 
by our approach and describe some of them. 

• Exhaustiveness and Exclusiveness: Having identified the set of possible states 
an instruction can be in, we require one to prove that an arbitrary instruction 
is always in one of these states (exhaustiveness) and never simultaneously in 
two states (exclusiveness). 

• Instruction state properties: Whenever an instruction is in a particular state, 
it satisfies some properties and these are established as invariants. One ex- 
ample is: if an instruction is in issued state, then the dispatch buffer entry 
assigned to it is valid and has the reorder buffer index of the instruction 
stored in it. 

• Eeedback logic invariant: Let rbi be an arbitrary instruction and let ptr 
be an instruction that is producing its source value. Then this invariant 
essentially states that all the instructions “in between” rbi and ptr have 
the destination different from the source of rbi, that ptr is in the written 
back state if and only if the source value of rbi is ready and that the source 
value of rbi (when ready) is equal to the result computed by ptr. 

• Example specific invariants: Other invariants needed include characterization 
about the reorder buffer bounds and the register translation table. 

PVS proof details: The proofs of all the induction obligations follow the pat- 
tern outlined in the sketch of issue_to_dispatch_induction lemma. The proofs 
of certain rewrite rules needed in the methodology [HSG98] and other simple 
obligations can be accomplished fully automatically. But there is no uniform 
strategy for proving the invariants. The manual effort involved one week of dis- 
cussion and planning and then 12 person days of “first time” effort to construct 
the proofs. The proofs got subsequently cleaned up and evolved as we wrote 
the paper. The proofs rerun in about 1050 seconds on a 167 MHz Ultra Sparc 
machine. 



This lemma states that if the source operand of an instruction is ready, then 
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4.4 Other obligations - liveness properties 

We provide a sketch of the proof that the processor eventually gets flushed if 
no more instructions are inserted into it. The proof that the synchronization 
function eventually returns a nonzero value is similar. The proofs involve a set 
of obligations on the implementation machine, a set of fairness assumptions on 
the inputs to the implementation and a high level argument using these to prove 
the two liveness properties. All the obligations on the implementation machine 
are proved in PVS. We now provide a brief sketch (due to space constraints) of 
the top level argument which is being formalized in PVS. 

Proof sketch: The processor is flushed if rb_front(q) = rb_end(q). 

• First observation: “any instruction in the dispatched state eventually goes 
to the executed state and then eventually goes to the written back state. It 
then remains in the written back state until retired” . Consider an instruction 
rbi in the dispatched state. If Execute_trans?(q, s, i ,rbi) is true, then 
rbi goes to the executed state in I_step (q, s , i) , otherwise it remains in 
the dispatched state (refer to Figure 3). We show that when rbi is in the 
dispatched state, the scheduler inputs that determine when an instruction 
should be executed are enabled and these remain enabled as long as rbi is in 
the dispatched state. By a fairness assumption on the scheduler, it eventually 
decides to execute the instruction (i.e., Execute.trans? (q, s , i ,rbi) will be 
true) and the instruction moves to the executed state. By a similar argument, 
it moves to the written back state and then remains in that state until retired. 

• Second observation: “every busy execution unit eventually becomes free and 
stays free until an instruction is dispatched on it” . 

• Third observation: “an instruction in the issued state will eventually go to 
the dispatched state” . Here, the proof is by induction since an arbitrary in- 
struction rbi could be waiting for a previously issued instruction to produce 
its source value. This step also relies on the earlier two observations. 

• Final observation: “the processor eventually gets flushed”. We know that 
every instruction eventually goes to the written back state — third and first 
observations. Also the instructions in the written back state are eventually 
retired by a fairness assumption on the scheduler. Since rb_front(q) re- 
mains unchanged when no new instructions are inserted into the processor 
and rb_end(q) is incremented when an instruction is retired, eventually the 
processor gets flushed. 

5 Conclusions 

We have demonstrated in this paper that the completion functions approach is 
well-suited for the verification of out-of-order execution processors with a reorder 
buffer. We have recently extended our approach to be applicable in a scenario 
where instructions “commit” out-of-order and illustrated it on an example pro- 
cessor implementing Tomasulo’s algorithm without a reorder buffer [HGS99]. 
The proof was constructed in seven person days, reusing lot of the ideas and 
the machinery developed in this paper. We are currently working on verifying 
a more detailed out-of-order execution processor involving branches, exceptions 




Proof of Correctness of a Processor with Reorder Buffer 



59 



and speculative execution. Our approach has been used to handle processors with 
branch and memory operations [HSG98] and we are investigating how those ideas 
carry over to this example. We are also developing a PVS theory of the “eventu- 
ally” temporal operator to mechanize the liveness proofs presented in this paper. 
Finally, we are investigating how the ideas behind the completion functions ap- 
proach can be adapted to verify certain “transaction processing systems” . 
Acknowledgments: We would like to thank Abdel Mokkedem and John Rushby 
for their feedbacks on the earlier drafts of this paper. 
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Abstract. In [1] Bounded Model Cheeking with the aid of satisfiability 
solving (SAT) was introduced as an alternative to symbolic model check- 
ing with BDDs. In this paper we show how bounded model checking can 
take advantage of specialized optimizations. We present a bounded ver- 
sion of the cone of infiuence reduction. We have successfully applied this 
idea in checking safety properties of a PowerPC microprocessor at Mo- 
torola’s Somerset PowerPC design center. Based on that experience, we 
propose a verification methodology that we feel can bring model checking 
into the mainstream of industrial chip design. 



1 Introduction 

Model checking has only been partially accepted by industry as a supplement to 
traditional verification techniques. The reason is that model checking, which, to 
date, has been based on BDDs or on explicit state graph exploration, has not 
been robust enough for industry. 

Model checking [3,12] was first proposed as a verification technique eighteen 
years ago. However, it was not until the discovery of symbolic model checking 
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techniques based on BDDs [2,5,10] around 1990 that it was taken seriously by 
industry. Unfortunately, BDD based model checkers have suffered from the fact 
that ordered binary decision diagrams can require exponential space. Recently 
a new technique called hounded model checking [1] has been proposed that uses 
fast satisfiability solvers instead of BDDs. The advantage of satisfiability solvers 
like SATO [15], GRASP [13], and Stalmarck’s algorithm [14] is that they never 
require exponential space. In [1], it was shown that this new technique sometimes 
performed much better than BDD based symbolic model checking. However, the 
performance was obtained on academic examples, and doubt remained about 
whether bounded model checking would work well on industrial examples. 

In this paper we consider the performance of a bounded model checker, BMC 
[1], in verifying twenty safety properties on five complex circuits from a PowerPC 
microprocessor. By any reasonable measure, BMC consistently outperformed the 
BDD based symbolic model checker, SMV [9]. In part, this performance gain was 
obtained by utilizing a new hounded cone of influence redaction technique which 
reduces the size of the CNF (conjunctive normal form) formula given to the 
satisfiability solver. 

We believe our new experimental results confirm that bounded model check- 
ing can handle industrial examples. Since we, ourselves, are convinced of this, we 
propose, here, a methodology for using bounded model checking as a supplement 
to traditional validation techniques in industry. We feel that this represents a 
significant milestone for formal verification. 



2 Models, Kripke Structures and Safety Properties 

For brevity, we focus on the application of bounded model checking to safety 
properties. The reader is referred to [1] for a more complete treatment of bounded 
model checking. 

We first consider models that can be represented by a set of initial and next 
state functions. 

Definition 1 (Model). Let X = {xi, . . . , . . . , he a set of rn 
Boolean variahles^ and let F = {/i, . . . , /n} a set of n < rn Boolean transition 
functions^ each a function over variables in X. Finally^ let R = {ri, . . . ,r^} 
he a set of initialization functions^ each a function over variables in X . Then 
M = (X^F^R) is called a model. 

From a model M we can construct a Kripke structure K = {S,T,I) in the 
following way. The set of states, 5, is an encoding of the variables in A, i.e., 
S = {0, l}’^. A state may also be considered a vector of these rn variables, 
X = (xi, . . . ,x^,x^+i, . . . ,x^). Note that we use italic identifiers s, sq, . . . for 
states (elements of 5 = {0,1}’^) and overhead bar identifiers s,sq for vectors 
of Boolean variables. We define present and next state versions of the variables 
in A, denoting the latter with primes, e.g., Xj. The variables in A serve as 
atomic propositions, and obviate the need for a labeling function. We define 
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the transition relation, T C S x S and the set of initial states ICS via their 
characteristic functions: 

n n 

T{s, s^) := ^ fj{^) I{s) := ^ Xj ^ Tj{x) 

j=i j=i 

Here, fj and rj are the transition and initialization functions, respectively, of 
the element of the variable vector, x. Note that transition and initialization 
functions are not specified for elements n + 1 through m of x. These represent 
primary inputs (Pis) to an underlying sequential circuit. 

In practice, we will often consider a set of propositional constraints imposed 
on a system. Given a model, M = {X,F,R)^ a constraint function, c, over A, 
and a Kripke structure, K = derived from M, a constrained Kripke 

structure^ Kc = {S^ Tc^ ic), in which c is an invariant, can be obtained as follows: 

Tc{s, F) := T{s, F) A c{s) A c{F) and Ic{s) := I (s) A c{s) 

As a specification logic we use Linear Temporal Logic (LTL). In this paper 
we consider only the unary temporal operators: eventually^ F, and globally^ G. 

A path TV = (-So, 5i, . . .) in a model M is an infinite sequence of states in 
the corresponding Kripke structure K such that T{si^ holds for all i G IN. 
We call TV initialized if I{sq) holds. It is often convenient to discuss the value of 
a component variable from the underlying vector, x, in a certain state along a 
path. The assignment to element Xj of x in state si along path tv is written as 

We are interested in determining whether M \= AGp holds, i.e., whether p, 
a propositional formula, holds in every state along every initialized path is some 
model, M. We approach this in two ways: (a) by searching for a finite length 
counterexample showing M |= EF-ip, or (b) by proving that p is an inductive 
invariant for M. These have in common that, in both cases, it is not necessary 
to search unto the diameter of the structure, the diameter being that minimal 
number of transitions sufficient for reaching any state from an initial state. 

3 Bounded Model Checking for Safety Properties 

In bounded model checking [1] the user specifies a number of time steps. A:, 
for searching from initial states. A propositional formula is then generated by 
introducing k F I vectors of state variables, each representing a state in the 
prefix of length A:, Aq, . . . , sT. Then the transition relation is unrolled k times, 
substituting for states the appropriately labeled state variable vectors: 

l ^ ]]fe := H^o) AT{so,Si) A • • • A7'(sfe-i,Sfe) (1) 

Every initialized path of the model M corresponds to an assignment that satisfies 
(1). When checking a safety property, Gp, where p is a propositional formula, 
we search for a witness to / = Fg, where q = ->p. A satisfying assignment to 
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( 1 ) can be extended to a path that is a witness for / (and a counterexample for 
Gp), iff q holds at one of the A: + 1 states or equivalently the assignment also 
satisfies: 

lflk-= li^o) V ?(s-i) V • • • q{s~k) (2) 

The final step is to translate the conjunction of (1) and (2) into CNF and check 
it with SAT tools such as [13,15,14]. Translation into CNF is described in [11]. 

4 Classical and Bounded Cone of Influence Reduction 

The Cone of Influence Reduction is a well known technique^. For bounded model 
checking this technique can be specialized to the Bounded Cone of influence 
Reduction^ described below. 

The basic idea of COI reduction is to construct a dependency graph of the 
state variables, rooted at the variables in the specification. The set of state 
variables in the graph is called the COI of the specification. In this paper, we 
call this the “classical” COI reduction. Variables not in the classical COI can 
not influence the validity of the specification and can therefore be removed from 
the model. 

Let dep{x) be the set of successors to variable x in the state variable depen- 
dency graph, i.e., the set of variables in the support of the transition function 
for X. The Bounded Cone of Influence Reduction is based on the observation 
that, for any state Sk along a path, the value of an arbitrary state variable, x, 
in the associated state variable vector, can depend only on state variables 
in state variable vector Sj, with j < k. Thus, it is only the copies, in 5 ^_i, of 
the variables that are in dep(x) that can determine the value of x in sT. Other 
state variables, and their corresponding transition functions can be removed. If 
we are looking for violations of a safety property at state this argument can 
be repeated, working backwards, until the initial state is reached. 

For instance, consider the following model with five state variables xi, . . . , X 5 
and transition functions 

/i = 1 , / 2 =^i, /a = ^2, /4 = ^3, /s = ^4 

Assume the state variables are initialized to constants: 

^^1=0, T2 = 1, rs = l, r4 = 1, rs = 1 

This model has only one execution sequence in which the 0 value is moved from 
xi to X5. After the 0 has reached X5 it vanishes, and all state variables stay at 

1 . 



01111 ^ 10111 ^ 11011 ^ 11101 ^ 11110 ^ mil ^ 

^ Cone of influence reduction seems to have been discovered and utilized by a number 
of people, independently. We note that it can be seen as a special case, of Kurshan’s 
localization reduction [8]. 
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If the property to check is the safety property that X4 is always true, i.e., GX4, 
classical COI reduction would remove just X5. Now, a counterexample for this 
property can be found by unrolling the transition relation three times. Let us 
assume that we only want to check for a counterexample in the last state, S3. 
To apply bounded COI we observe that X4 in S 3 only depends on X3 in s~2 which 
in turn depends on X2 in si, which only depends on the initial value of xi. 
Therefore we can remove all other variables and their corresponding transitions. 
This application of bounded COI reduction results in the following formula: 

'^o(l) ^ 0 A Si(2) ^ '5o(l) A S2(3) ^ so(2) A <^ 3 ( 4 ) ^ '^o(3) A — 1 ^ 3 ( 4 ) 

This formula is satisfiable, and its only satisfying assignment can be extended 
to a counterexample for the original formula, GX4. Without bounded COI, 12 
more equalities would have been necessary. 

For a formal treatment of the bounded COI reduction we define the hounded 
dependency set^ bdep(si{j))^ of a component, s^(j), of state variable vector, s^, 
as follows. Here, s^ represents a state Si along a path prefix: 

bdep(si{j)) := if i = 0 then 0 else {si-i(l) \ xi G dep(xj)} 

The hounded COI^ bcoi(si{j))^ of component Si{j) is defined, recursively, as 
the least set of variables that includes and includes, for each G 

bdep{si{j))^ if any, the variables in bcoi{si-i{l)) . 

For a fixed the length of the considered prefix, we define the bounded COI 
of an LTL formula, /, as: 

bcoi(fc,/) := {x e bcoi{si{j)) \ 80) e var([[ / ]]^,)} 

where var([[ / ]]^) is the set of variables of [[ / ]]^. 

In (1) we can now remove all factors of the form Si(j) ^ ... where Si{j) ^ 
6coi(/), and derive (for simplicity, we do not remove initial state assignments): 

[ M := 10) ATo{so,0 A • • • A Tk-i{sk-i,0 

where 

Ti-i{si-i,Si) := f\ 0j) ^ fj{si-i) fovi = l...k 

Si(j)ehcoi(kJ) 

The correctness of the bounded COI reduction is formulated in the following 
theorem. 

Theorem 1. Let f = Fq he an LTL formula with q a propositional formula. 
Then [[ / ]]j^ A [[ M ]]j^ is satisfiahle iff \ f \ M satis fiahle, 

5 Experiments 

We used the bounded model checker, BMC, on subcircuits from a PowerPC 
microprocessor under design at Motorola’s Somerset design center, in Austin, 
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Texas. BMC accepts a subset of the input format used by the widely known 
SMV model checker [9]. 

When a processor is under design at Somerset, designers insert assertions 
into the RTL simulation model. These Boolean expressions are important safety 
properties. The simulator flags an error if these are ever false. We checked, with 
BMC, 20 assertions chosen from 5 different processor design blocks. For each 
assertion, p, we: 

1. Checked whether p was a combinational tautology. 

2. Checked whether p was otherwise an inductive invariant. 

3. Checked whether AGp held for various time bounds, from 0 to 20. 

Each circuit latch was represented by a state variable having individual next 
state and initial state assignments. For the latter, we assigned the 0 or 1 value 
the latch would have after a designated power-on-reset sequence known to the 
designer. Primary inputs were modeled as unconstrained state variables, having 
neither next state nor initial state assignments. 

For combinational tautology checking we deleted all initialization statements 
and ran BMC with /^ = 0, giving the propositional formula, p, as the specifica- 
tion. Under these conditions, the specification could hold only if p held for all 
assignments to the variables in its support. 

We then checked whether p was an inductive invariant. A formula is an 
inductive invariant if it holds in all initial states and is preserved by the transition 
relation. Leaving all initialization assignments intact, for each design block and 
each formula p, we gave p as the specification and set k = 0. This determined 
whether each p held in the single, valid initial state of each design. Then, for each 
design block and for each formula, p, we removed all initialization assignments 
and specified p as an initial state predicate. We set A: = 1 and checked the 
specification AGp. If the specification held, this meant the successors of every 
state satisfying p^ also satisfied p. Note that AG^ could fail to hold exclusively 
due to transitions out of unreachable states. Therefore, this technique can only 
show that p is an invariant, it cannot show that it is not. 

The output of BMC is a Boolean formula in CNF that is given to a satisfiabil- 
ity solver. In these experiments, we used both the GRASP [13] and SATO [15] 
satisfiability solvers. When giving results, we give the best result from the two. 

We also ran a recent version of the SMV model checker on each of the 20 AGp 
specifications. We used command line options that enabled the early detection, 
during reachability analysis, of false AGp properties, so that SMV did not need 
to compute a fixpoint. This made the comparison to BMC more appropriate. 
We also enabled dynamic variable ordering when running SMV, and used a 
partitioned transition relation. 

All experiments were run with wall clock time limits. The satisfiability solvers 
had 15 minutes for each run, while SMV had an hour. BMC was not timed, as 
the task of translating to CNF is usually done quite quickly. The satisfiability 
solving and SMV runs were done on RS6000 model 390 workstations, having 256 
megabytes of local memory. 




66 



A. Biere et al. 



We did not model the interfaces between the 5 design blocks and the rest 
of the microprocessor or the external computer system in which the processor 
would be placed. This is commonly referred to as “environment modeling” . One 
would ideally like to do environment modeling, since subcircuits usually work cor- 
rectly only under certain input constraints. However, one will get true positives 
for safety properties with a totally unconstrained environment. Given Kripke 
structures and M, representing a design block with an unconstrained 
environment and M the same block with its real, constrained environment, it is 
obvious that simulates M, i.e. M < in the simulation preorder. It has 
been shown in [4,6] that if / is an ACTL formula, as are all the properties in 
these experiments, then \= f implies M |= /. 

Our experiments did result in false negatives. Upon inspection, and after 
checking with circuit designers, it seems all the counterexamples generated were 
due to impossible input behaviors. However, our purpose in these experiments 
was to show the capacity and speed of bounded model checking, and the false 
negatives did not obscure these results. We discuss, in Section 6, a methodology 
wherein false negatives could be lessened or eliminated, by incorporating input 
constraints into the bounded model checking. We certainly feel this would be 
the way to use bounded model checking in a non-experimental, industrial ap- 
plication. The reader may also want to refer to [7], where input constraints are 
considered in a BDD based verification environment. 



5.1 Experimental Results 

The 5 design blocks we chose all came from a single PowerPC microprocessor, 
and were all control circuits, having little or no datapath elements. Their sizes 
were as follows: 



Circuit 


Latches 


Pis 


Gates 


bbc 


209 


479 


4852 


ccc 


371 


336 


4529 


ede 


278 


319 


5474 


die 


282 


297 


2205 


sdc 


265 


199 


2544 



Before COI 



Circuit 


Spec 


Latches 


Pis 


bbc 


1 - 4 


150 




CCC 


1 - 2 


77 


207 


ede 


1 - 4 


119 


190 


die 


1 - 6 


119 


170 


die 


7 


119 


153 


sdc 


1 - 2 


113 


121 


sdc 


3 


23 





After (classical) COI 



On the left, we report the original size of each circuit, and on the right, 
the sizes after classical COI reduction. Each specification is given an arbitrary 
numeric label. These do not relate across design blocks, e.g., specification 2 of 
die is in no way related to specification 2 of sdc. Many properties involved much 
the same circuitry on a design block, as can be seen by the large number of 
cones of infiuence having identical numbers of latches and Pis. However, these 
reduced circuits were not identical, though they may have differed only in how 
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the variables in the specification depended, combinationally, upon latches and 

Pis. 



k 


Bounded COI 


Classic COI 


No COI 


0 


137 / 449 


234 / 546 


376 / 688 


1 


1023 / 3762 


1801 / 6790 


3402 / 12749 


2 


2330 / 8946 


3367 / 13025 


6426 / 24801 


3 


3755 / 14631 


4931 / 19259 


9450 / 36851 


4 


5259 / 20608 


6496 / 25492 


12473 / 48901 


5 


6820 / 26821 


8060 / 31725 


15496 / 60951 


10 


14643 / 57987 


15883 / 62891 


30613 / 121202 


15 


22466 / 89153 


23706 / 94057 


45730 / 181452 


20 


30288 / 120319 


31529 / 125223 


60846 / 241702 



Average Bounded COI Reduction 



Circuit 


Spec 


Tautology 


Tran Rel’n 


Init State 


bbc 


1 


N 


N 


Y 


bbc 


2 


N 


Y 


N 


bbc 


3 


N 


N 


Y 


bbc 


4 


N 


N 


Y 


ccc 


1 


N 


N 


Y 


ccc 


2 


N 


N 


Y 


cdc 


1 


N 


N 


Y 


cdc 


2 


Y 


Y 


Y 


cdc 


3 


Y 


Y 


Y 


cdc 


4 


Y 


Y 


Y 


die 


1 


N 


N 


Y 


die 


2 


N 


N 


Y 


die 


3 


N 


N 


Y 


die 


4 


N 


N 


Y 


die 


5 


N 


N 


Y 


die 


6 


N 


N 


Y 


die 


7 


N 


N 


Y 


sdc 


1 


N 


Y 


Y 


sdc 


2 


N 


N 


Y 


sdc 


3 


N 


N 


N 



Tautology and Invariance Checking 



We ran BMC for values of k of 0, 1, 2, 3, 4, 5, 10, 15 and 20, on each specifica- 
tion. For each of these, we had BMC create CNF files having no COI reduction, 
only classical COI, and both classical and bounded COI. In the table labeled 
“Average Bounded COI Reduction”, we give average sizes of all these CNF files. 
We averaged the number of literals and clauses (a clause is a disjunct of literals) 
in all the CNF files for each i.e., for all specifications, for all design blocks. 
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for that k. We checked, by hand, that this averaging did not obscure the median 
case. In the table, we give to the left of a slash, the average number of literals for 
a k value, and to the right, the average number of clauses. It can be seen that 
the advantage of bounded COI decreases with increasing k. Intuitively, this is 
because, going out in time, eventually values are computed for all state variables 
in the classical cone of influence. However, at k up to 10, bounded COI gives 
distinct benefit. Since bounded model checking seems to be most effective at 
finding short counterexamples, and, since tautology and invariance checking are 
run at low k^ we feel bounded COI augments the system’s strengths. 

The table labeled “Tautology and Invariance Checking” has columns for tau- 
tology checking, for preservation by the transition relation and for preservation 
in initial states. The last two must both hold for a formula to be an inductive 
invariant. These runs were done with bounded COI enabled. A“Y” in a column 
indicates a condition holding, an “N” that it does not. Time and memory usage 
are not listed, since these were < 1 second < 5 megabytes in all but three cases. 
In the worst case, sdc specification 2, 60 seconds of CPU time and 6.5 megabytes 
of memory were required, for checking preservation by the transition relation. 
Clearly, tautology and invariance checking can be remarkably inexpensive. In 
contrast, these can be quite costly with BDD based methods. 



circuit 
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long k 
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clauses 


time 
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holds 


fail k 


bbc 


1 


4 


7873 


30174 


35.4 


NR 


Y 




bbc 


2 


15 


34585 


93922 


5.5 


84 


N 


0 


bbc 


3 


10 


16814 


63300 


58 


NR 


Y 




bbc 


4 


5 
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35658 


18 


NR 


Y 
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1 


5 


9396 


40450 


1.3 


36 


N 


1 


ccc 


2 


5 


9148 


38841 


1.4 


39 


N 


1 


cdc 


1 


20 


49167 


207764 


128 


77 


N 


2 


cdc 


2 


20 


50825 


213137 


4.7 


NR 


Y 




cdc 


3 


20 


50571 


213614 


4.7 


NR 


Y 




cdc 


4 


20 


50491 


212406 


4.8 


NR 


Y 
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2.9 


64 
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2 


die 


2 


20 


18024 


69830 


2,8 


63 


N 


2 


die 


3 


20 


17603 


68333 
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60 


N 


2 


die 


4 


20 


18085 


69942 


2.73 


61 


N 


1 


die 


5 


20 


18378 


71291 


2.9 


60 


N 


2 


die 


6 


20 


17712 


68714 


2.7 


NR 


N 


2 


die 


7 


20 


16217 


63781 


2.4 


64 


N 


0 


sdc 


1 


4 


5554 


20893 


72 


14 


Y 




sdc 


2 


4 


5545 


20841 


548 


21 


Y 




sdc 


3 


20 


4119 


15168 


- 


3 


N 


0 



Highest k Values 



The table labeled “Highest k Values” shows the results of increasing k. These 
runs, again, were with bounded COI. We ran to large k regardless of whether 
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we found counterexamples, or determined a property was an invariant, at lower 
k. It was sometimes difficult to obtain memory usage statistics during satisfia- 
bility solving; but, this usually does not exceed that needed to store the CNF 
formula. In the table, NR means not recorded (data unavailable). Time is given 
in seconds, memory usage in megabytes, with dashes appearing where these were 
insignificant. The “vars” and “clauses” columns give the number of literals and 
clauses in the CNF file for the highest value of k on which satisfiability solving 
completed, the k in the “long k” column. The time and memory usage listings 
are for satisfiability solving at this highest k value. A “Y” in the “holds” column 
indicates the property held through all values of k tested, and an “N” indicates 
a counterexample was found. When these were found, the “fail k” column gives 
the the first k at which a counterexample appeared. Time and memory consump- 
tion are not listed for the runs giving counterexamples, because the satisfiability 
solving took less than a second, and no more than 5 megabytes of memory, in 
each case! 

Lastly, the BDD-based model checker, SMV, completed only one of the 20 
verifications it was given. The 19 others all timed out at one hour of wall clock 
time, with SMV unable to build the BDDs for the partitioned transition relation. 
SMV was only able to complete the verification of sdc, specification 3. Classical 
COI for this specification gave a very small circuit, having only 23 latches and 
15 Pis. SMV found the specification false in the initial state, in approximately 
2 minutes. Even this, however, can be contrasted to BMC needing 2 seconds to 
translate the specification to CNF, and the satisfiability solver needing less than 
1 second to check it! 

6 A Verification Methodology 

Our experimental results lead us to propose an automated methodology for 
checking safety properties on industrial designs. In what follows, we assume a 
design divided up into separate blocks, as is the norm with hierarchical VLSI 
designs. Our methodology is as follows: 

1. Annotate each design block with Boolean formulae required to hold at all 
time points. Call these the block’s inner assertions. 

2. Annotate each design block with Boolean formulae describing constraints on 
that block’s inputs. Call these the block’s input constraints. 

3. Use the procedure outlined in Section 6.1 to check each block’s inner as- 
sertions under its input constraints, using bounded model checking with 
satisfiability solving. 

This methodology could be extended to include monitors for satisfaction of 
sequential constraints, in the manner described in [7], where input constraints 
were considered in the context of BDD based model checking. 

6.1 Safety Property Checking Procedure 

Let us consider a Kripke structure, A, for a design block having input con- 
straints, c. A constrained Kripke structure^ Kc^ can be derived from K as in 
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Section 2. To check whether an inner block assertion, p, is an invariant in 
we need not work with Kc directly. Unrolling the transition relation of Kc^ as 
per formula (1) of Section 3, is entirely equivalent to unrolling the transition 
relation of AT, and conjoining each term with the constraint function, c: 

I ^ ]]fe •= -^(®o) A c(so) A7'(so,s“i) Ac(si) A • • • A7'(sfe-i,Sfe) A c(s”fe) (3) 

The steps for checking whether a block’s inner assertion, p, is an invariant 
under input constraints, c, are: 

1. Check whether p is a combinational tautology in K. If it is, exit. 

2. Check whether p is an inductive invariant for K. If it is, exit. 

3. Check whether p is a combinational tautology in Kc- If it is, go to step 6. 

4. Check whether p is an inductive invariant for Kc- It it is, go to step 6. 

5. Check if a bounded length counterexample exists to AGp in Kc- If one is 
found, there is no need to examine c, since the counterexample would exist 
without input constraints^. If a counterexample is not found, go to step 

6. The input constraints may need to be reformulated and this procedure 
repeated from step 3. 

6. Check the input constraints, c, on pertinent design blocks, as explained be- 
low. 

Inputs that are constrained in one design block. A, will, in general, be outputs 
of another design block, B. To check A’s input constraints, we turn them into 
inner assertions for and check them with the above procedure. One must 
take precautions, however, against circular reasoning. Circular reasoning can be 
detected automatically, however, and should not, therefore, be a barrier to this 
methodology. 

The ease with which we carried out tautology and invariance checking indi- 
cates the above is entirely feasible. Searching for a counterexample, step 5, may 
become costly at high k values; however, this can be arbitrarily limited. It is 
expected that design teams would set limits for formal verification, and would 
complement its use with simulation, for the remainder of available resources. 

7 Conclusion 

In this paper, we have outlined a specialized version of cone of infiuence reduc- 
tion for bounded model checking. The present set of experiments, on a large and 
complex PowerPC microprocessor, are compelling. They tell us that, for some 
applications, the efficiency of model checking has increased by orders of magni- 
tude. The fact that the BDD-based SMV model checker failed to complete on all 
but one of 20 examples, underscores this point. We still believe, however, that 
BDD-based model checking fills important needs. Certainly, it seems to be the 

^ This is implied by the theorems for ACTL formulae in [4,6], which we referred to in 
Section 5 
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only technique that can presently find long counterexamples, though, of course, 
this can be done only for designs that fall within its capacity limitations. 

We feel that new verification methodologies can now be introduced in indus- 
try, to take advantage of bounded model checking. We have outlined one such 
procedure here, for checking safety properties. Our hope is that the widened use 
of model checking will illuminate further possibilities for optimization. 
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Abstract. A common technique in high-performance hardware design is 
to intersperse combinatorial logic freely between level-sensitive latch lay- 
ers (wherein one layer is transparent during the “high” clock phase, and 
the next during the “low”). Such logic poses a challenge to verification — 
unless the two-phase netlist N may be abstracted to a full-cycle model 
(wherein each memory element may sample every cycle), model checking 
of N requires at least twice as many state variables as would be neces- 
sary to obtain equivalent coverage for N' . We present an algorithm to 
automatically obtain such an abstraction by selectively eliminating lat- 
ches from both layers. The abstraction is valid for model checking CTL* 
formulae which reason solely about latches of a single phase. This algo- 
rithm has been implemented in IBM’s model checker, RuleBase, and has 
been used to enable model checking of IBM’s Gigahertz Processor, which 
may not have been feasible otherwise. This abstraction has furthermore 
allowed verification engineers to write properties and environments more 
efficiently. 



1 Introduction 

A latch is a hardware memory element with two Boolean inputs - data and clock 
- and one Boolean output. A behavioral definition for latches is provided in [1]. 
High performance netlists often must use level-sensitive latches [2]. For such a 
latch, when its clock input is a certain value (e.g., a logical “1”), the value at 
its data input will be propagated to its data output (i.e., transparent mode); 
otherwise, its last propagated value is held at its output. 

The clock is modeled as a signal which alternates between 0 and 1 at every 
time-step. A latch which samples when the clock is a 1 will be denoted as an 
LI latch; one which samples when the clock is a 0 will be denoted as an L2 
latch. Hardware design rules, arising from timing constraints, require any logic 
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path between two LI latches to pass through an L2 latch, and vice-versa. An 
elementary design style requires each LI latch to feed directly to an L2 latch 
(called a master-slave latch pair), and allow only L2 to drive combinatorial logic. 
However, a common high-performance hardware development technique involves 
utilizing combinatorial logic freely between LI and L2 latches to better utilize 
each half-cycle. It should be noted that such designs are typically explicitly 
implemented in this manner; this topology is not the byproduct of a synthesis 
tool. 

There are two major problems with the verification of such netlists. First, 
because of the larger number of latches, the verification tool requires much more 
time and memory. Additionally, the manual modeling of environments and pro- 
perties is more complicated in that they must be written in terms of the less 
abstract half-cycle model, and an oscillating clock must be explicitly introduced. 

Most hardware compilers will allow automatic translations of a master-slave 
latch pair into a single flip-flop; retiming algorithms [3] may be used to retime 
the netlist such that L1-L2 layers become adjacent and one-to-one. However, 
retiming adds complexity in that the specification, the environment, and any 
witnesses / counterexamples (all of which may “observe” the netlist), may need 
to be retimed as well to match the retimed, full-cycle model. 

We develop an efficient algorithm for abstracting a half-cycle netlist N to 
a full-cycle model which may be utilized for enhanced verification in any 
FSM-based verification framework (e.g., simulation and model checking). We 
wifi achieve this by selectively eliminating some latches. We wifi use a notion of 
dual-phase-bisimulation equivalence between the abstracted and unabstracted 
models. This equivalence ensures that specification and environment written in 
terms of L2 latch outputs need not be modified other than a conversion to full- 
cycle format (as wifi be discussed in Sect. 3). Our algorithm performs maximum 
such reductions, and thus provides an important model reduction step which may 
greatly augment existing techniques (such as retiming, cone-of-influence, etc.). 
As we show, this reduction alone reduces the number of state variables by at 
least one-half, and has greatly enhanced the model checking of IBM’s Gigahertz 
Processor, which may not have been feasible otherwise (as demonstrated by 
our experimental evidence). This abstraction is now part of the model checker 
RuleBase [4]. Additionally, designers and verification engineers prefer to reason 
about the full-cycle models. 

The optimality of the algorithm results from the identification of minimal 
dependent layers (MDL) of latches, and removing all Lis or all L2s per MDL. 

Definition 1. A dependent layer is a set of LI and L2 latches LV and L2' ^ 
such that L2^ is a superset of all latches in the transitive fanout ofLlf and LP 
is a superset of all latches in the transitive fanin of L2h 



Definition 2. A dependent layer is termed minimal if and only if there does 
not exist a nonempty set of LI and L2 latches V which may he removed from 
that layer and still result in a nonempty dependent layer. 
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Consider the netlist in Fig. 1 (the triangles denote combinatorial logic, and 
the rectangles denote latches). The LI latches are shaded. The two unique MDLs 
are marked with dotted boxes. Merely removing all Lis or all L2s will not yield 
an optimum reduction in this case; the Lis of layer A, and the L2s of layer B 
should be removed to yield an optimum solution for this netlist, which removes 
four of the six latches. 




Primary 

Outputs 



Fig. 1. Sample Netlist with Two Minimal Dependent Layers 



In Sect. 2 we introduce a half-cycle netlist, and two different abstracted full- 
cycle models of this netlist. In Sect. 3 we study the state space of the netlist 
and its two abstracted models to demonstrate the validity of the abstraction for 
CTL* formulae which reason solely about latches of a single type (LI or L2). In 
Sect. 4 we introduce the algorithm used to perform the netlist reduction, and 
demonstrate its optimality. In Sect. 5 we give some experimental results of the 
use of this algorithm as implemented in RuleBase [4] for application to IBM’s 
Gigahertz Processor. 



2 Half-Cycle versus Full-Cycle Models 

Consider the half-cycle netlist, denoting an MDL, shown in Fig. 2. All nets and 
primitives may be vectors. 




Fig. 2. Half-Cycle Netlist N 



Definition 3. A netlist is dual-phase (DP) if and only if: 

1. All latches in the transitive fanouts of LI latches are L2 latches^ and 
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2. All latches in the transitive fanin of L2 latches are LI latches^ and 

3. No primary inputs exist in the transitive fanin of any L2 latch^ and 
f. No primary outputs exist in the transitive fanout of any LI latch. 

The first two rules are enforced by hardware timing constraints. Note that, 
at the periphery of a design, there may be some inputs which have L2 latches 
in their transitive fanout, and outputs which are in the transitive fanout of 
LI latches (thus violating rules 3 and 4). While the analysis presented in this 
paper disallows such connectivity for simplicity, these cases are supported by our 
implementation; for ease of reasoning, we have found it beneficial to preserve all 
L2 latches which violate rule 3, and to remove all LI latches which violate rule 

4. 

The notion of MDLs (Defn. 2) allows us to partition the design under test 
into a maximum number of partitions such that each is DP. Next, we propose 
two abstractions for DP net lists. For each DP partition of the original design, one 
of these two abstractions may be applied independently of the other partitions, 
thus yielding an overall abstraction which has a globally minimum number of 
latches (refer to Theorem 5). This minimum would, in general, be less than 
removing either all of the LI or all of the L2 latches. 

In this paper, we assume that properties may only refer to the L2 nets (which 
we term L2 — visible properties). In our actual implementation, we also handle 
the case where the properties refer only to LI nets. Furthermore, by forcing our 
tool to remove only LI or only L2 latches (i.e., restricting its freedom to choose 
which type to remove), each property may refer to both types of nets. However, 
we skip these generalizations in this paper. 

2.1 The Abstracted Models 

The values of the nets in Fig. 2 are specified for time-steps i > 0. The pre- 
specified initial values of the latches are — 1) and Do(0..n — 1). Let c 

denote the clock input, which initializes to 1, and alternates between 1 and 0 
at every time step, indicating whether the LI or L2 latches (respectively) are 
presently transparent. The subscript i means “at time i”. 

For i > 0, if (q = 1), i^^(0..m — 1) = M^_i(0..m — 1), else i^^(0..m — 1) = 
— 1). Similarly, for i > 0, if (c^ = 1), D^(0..n — 1) = D^_i(0..n — 1), 
else Di(0..n — 1) = C^_i(0..n — 1). For the combinatorial nets, M^(0..m — 1) = 
fl{BIi{0.L - 1), Di{0.m - 1))] Ci{0.m-1) = /2(i^^0..m - 1)); and PO^O..o- 
l) = /3(A(0..n-l)). 

Either layer of latches may be removed (and the remaining layer transformed 
to flip-flops which may be clocked every cycle - not by an alternating clock), 
and the resulting abstracted model will be shown to be bisimilar to the original 
net list. Fig. 3 shows the first abstraction with layer L2 removed. We need a 
new variable, labeled /, whose initial value is 1, and thereafter is 0. This latch 
ensures that the initial value Do from the original net list N (which need not be 
deterministic) is applied to the combinatorial nets D in Nh Bo{0..m — 1) is still 
the initial value of the remaining latches. Fori > 0, i^^(0..m— 1) = M^_i(0..m— 1). 
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If [fi = 1), Di{0..n — 1) = Do{0..n — 1), else Di{0..n — 1) = Ci{0..n — 1). For 
the other combinatorial nets, Ai{0..m — 1) = — l),D^(0..n — 1)); 

Ci{0..n - 1) = f2{Bi{Q..m - 1)); and PO^(0..o - 1) = /3( A(0..n - 1)). 




Fig. 4. Alternate Abstracted Model N'' 



Fig. 4 illustrates the second abstraction, which removes the Lis. Do{0..n— 1) 
is the initial value of the remaining latches; D^(0..n— 1) = C^_i(0..n— 1). For the 
combinatorial nets, A^(0..m— 1) = /1(/^A(0..A: — 1), L>^(0..n — 1)); i^^(0..m — 1) = 
A^(0..m— 1); C^(0..n — 1) = /2(i^^(0..m— 1)); and PO^(0..o— 1) = /3(L>^(0..n — 
1)). Note that the / variable is unnecessary for this abstraction; the initial value 
of the removed latch does not propagate. 

It is noteworthy that either one of the two abstractions may be chosen; since 
the layers may be of differing width (m ^ n), the removal of one layer may result 
in a smaller state space than the other. We term both of the above reductions 
as dual-phase (DP) reductions. 

3 Validity of Abstraction 

We define a notion of dual-phase-bisimulation relation (inspired by Milner’s bisi- 
mulation relations [5]); this notion is preserved for composition of Moore machi- 
nes. Further, if two structures are related by a dual-phase-bisimulation relation, 
we show that L2 — visible CTL* properties are preserved (modulo a simple trans- 
formation). We show the existence of a dual-phase-bisimulation relation for both 
abstractions presented in the previous section. 

We will relate our designs to Kripke structures, which are defined as follows. 
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Definition 4. A Kripke structure 1C = ^ where S is a set of 

states^ So S is the set of initial states^ A is the set of atomic propositions^ 
C:S ^2^ is the labeling function^ and R : S x S is the transition relation. 

Our designs are described as Moore machines (using Moore machines, instead 
of the more general Mealy machines [6], simplifies the exposition for this paper, 
though our implementation is able to handle Mealy machines). We use the fol- 
lowing definitions for a Moore machine and its associated structure (similar to 
Grumberg and Long [7]). 

Definition 5. A Moore machine Ai = (L, S', S'o, / , O, M, 7 )^ where L is the set 
of state variables (latches)^ S : 2^ is the set of states^ Sq C S is the set of initial 
states^ I is the set of input variables^ O is the set of output variables^ V C L 
is the set of property visible nets^ S : S x 2^ x S is the transition relation^ and 
7 : S ^ 2^ is the output function. 

Definition 6 . The structure of a Moore machine Ai — (L, S, So, /, O, M, 7) is 
denoted by K{M ) = {S^ ^ ^A^JC^ R) ^ where = 2^ x 2^^ = Sq x 2^^ 

A = V ^ C = 2 ^^ and R{{s^ x), (t, y)) i 6{s^ x, t) , 

In the sequel we will use M to denote the Moore machine as well as the 
structure for the machine. We now define our notion of dual-phase-bisimilarity, 
which characterizes our proposed abstraction. 

Definition 7. Let M and be two structures, A relation G C S x is a 
dual-phase-bisimulation relation ifGfs^s') implies: 

1, L{s) = L^{s^), 

2, for every t,u G S^ such that Rfs^v) and R{vR)^ we have C[s) = C{y)^ and 
there exists T G S^ such that R\sfT) and G(t,C). 

3, for every T G Sf such that R^fsfT)^ there exist t^v E S such that C{v) = 

R[s^v)^ R{yA)^ and G(tR^), 

We say that a dual-phase-bisimulation exists from M to (denoted by M 
M' ) i there exists a dual-phase-bisimulation relation G such that for all s G So 
and h G there exist t E Sq and E Sq such that G(s, s^) and G(t,t^). 

Notice that, in the above definition, such a dual-phase-bisimulation relation 
may exist only if M has a dual-phase nature - i.e., for all i, the visible labels of 
states S 2 i and S 2 ipi are equivalent. 

An infinite path tt = (so, <si, S2, . . .) is a sequence of states (so G So) such that 
any two successive states are related by the transition relation (i.e., R{si^ s^^i)). 
Let 7T^ denote the suffix path (s^, • • -)• dual-phase- 

bisimulation relation exists between two infinite paths tt = (-so, <si, S 2 , . . .) and 
7T^ = (sq, S 2 , . . .), denoted by G(7 t, tt^), iff for every i, G(s 2 ^, s[f). 

The composition of Moore machines (Mi || M 2 ) is defined in the standard 
way [7], by allowing the outputs of one design to become inputs of the other. 
The following result is shown similarly as the proof that simulation precedence 
is preserved under composition [7]. 
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Theorem 1. //Mi -< M( and M 2 ~< then a dual-phase-hisimulation exists 
from the Moore composition Mi || M 2 to the Moore composition M[ || M^ (i.e.^ 
Ml II M 2 -< M[ II M^). 

The set of dual-phase-reducible CTL* formulae is a subset of CTL* formu- 
lae [8], and is a set of state and path formulae given by the following inductive 
definition. We also define the dual-phase reduction for such formulae: 

Definition 8. A dual-phase-reducible (DPR) CTL* formula and its dual- 
phase reduction^ denoted hy are defined inductively as: 

— every atomic proposition p is a DPR state formula / = p; i?(/) = p. 

— if p is a DPR state formula^ so is f = -^p; i?(/) = 

— if p^q are DPR state formulae^ so is f = p A q; i7(/) = Dfp) A D[q), 

— if p is a DPR path formula^ then f = Ep is a DPR state formula; i7(/) = 
En{p). 

— each DPR state formula f is also a DPR path formula 

— if p is a DPR path formula^ so is f = -ip; i7(/) = -ii7(p). 

— if p^ q are DPR path formulae^ so is f = p A q; i?(/) = L2[p) A D{q), 

— if p is a DPR path formula^ so is f = XXp; i7(/) = Xi7(p), 

— if p^q are DPR path formulae^ so is f = pUq; i7(/) = i7(p)Ui7(g). 

Note that XX is transformed to X through the reduction; intuitively, this is 
due to the “doubling of the clock frequency” , or the replacement of the oscillating 
clock with an “always active” clock, enabled by the abstraction. As an example, 
if / = AG{rdy (AXAX(reg ^ AF(acA:)))), then i?(/) = AG{rdy (AX(reg ^ 
AF(acA:)))) (note that AXAXp is equivalent to AXXp). L2 — visible properties 
may be readily expressed utilizing DPR CTL*, since latches of any given type 
may only toggle every second time-step; there is no need to express such a 
property with a single A, which is the only restriction we impose upon full 
CTL* expressibility. 

Theorem 2. Let s and P he states of M and Mf and tv = (-Sq, <si, S 2 , . . .) and 
Tv^ = (sq, Si^ -52, . . .) he infinite paths of M and Mf respectively. If G is a dual- 
phase-hisimulation relation such that Gfs^P) and G{tv^7v')^ then 

1. for every dual-phase-reducihle CTL* state formula s ^ 4> I s' 1= 

2. for every dual-phase-reducihle CTL* path formula tv \= f i tv^ \= i7(/). 

We describe the Moore specifications (Defn. 5) for the abstractions presented 
in Sect. 2. Refer to Figs. 2-4. Let c be the clock variable which alternates between 
1 and 0, indicating whether the LI or L2 latches are presently transparent, 
respectively. The original netlist N — (L^, P/, PO, 4^, A^) has 

L^ = B U D \J {c}, = P, and the transition and output functions, and 

A^, are given by the formulae in Sect. 2.1. The state space of N is denoted 
by (6, d, u,x), comprising of the values of latches P, P, c, and the input P/, 
respectively. 
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As presented here, the properties cannot refer to inputs - does not con- 
tain inputs. This restriction is due to the requirement that visible labels of states 
S 2 i and 52^+1 are identical (Defn. 7), and is not necessary if the inputs to the 
design do not change values between S 2 i and This assumption is typically 

sound; except for clock inputs (which no longer need to be modeled), synthesis 
timing constraints enforce this requirement (since the partition will ultimately 
be composed with other partitions, or occur at chip boundaries). After our ab- 
straction, the environment is no longer constrained from toggling only once every 
two time-steps, but may toggle every time-step - this reflects a conversion of the 
environment from half-cycle to full-cycle, and (along with the synthesis require- 
ments reflected in rules 1 and 2 of Defn. 3, and the synthesis requirement that the 
design be free from combinatorial loops) allows applicability of this abstraction 
to Mealy machines. 

The first abstraction N' = Sq\ PI ^ PO^V^\ which we 

denote the “remove-L2” abstraction, has = BU {/}, Sq = {Bq^ 1), = 

D. Again, the transition and output functions, and , are given by the 
formulae in Sect. 2.1. The state space of is denoted by (6, tc, x), comprising of 
the values of latches 5, /, and the input PI ^ respectively. The second abstraction 
which we denote the “remove-Ll” abstraction, has = D, = Dq, 
= D. The state space of is denoted by (<i, x), comprising of the values 
of latches D and the input PI ^ respectively. Note that we define V in all cases 
as D, the L2 latch outputs, as necessary for arbitrary L2 — visible properties, 
and to enable the dual-phase-bisimulation. 

Theorem 3. If is a ^lemove-L 2^^ abstraction of N ^ then N -< NX 

Proof, The following relation G between states of and A^Ms a dual-phase- 
bisimulation relation. G is defined so that it is 1 only for the following two cases: 

— for any x, G[{Bo^ Dq, T ^)? (^o^ ^)) is 1 

— for any (6, d, l,x) reachable from the initial state of N after at least one 
transition, G((6, d, 1, x), (6, 0, x)) is 1 

Theorem 4. If N'' is a f-eraove-LP^ abstraction of then N -< N'X 

Proof, The following relation G between states of N and N'' is a dual-phase- 
bisimulation relation. G is defined so that it is 1 only for the following two cases: 

— for any x, G((i^o, Do, 1, x), (Do, x)) is 1 

— for any (6, d, l,x) reachable from the initial state of N after at least one 
transition, G((6, d, 1, x), (d, x)) is 1 

Theorems 1,2,3 and 4 allow us to apply the two abstractions independently 
on each dependent layer, and still show the validity of model checking L2— visible 
CTL* formulae on the composition of the abstractions: 

Corollsiry 1. If is obtained from D by applying either the ^lemove-L 2^^ ab- 
straction or the Pemove-Ll abstraction^ independently^ on each of its minimal 
dependent layers^ D DX 
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4 Algorithm for the Abstraction of DP Netlists 

The algorithm picks a primary input at random - Si while loop ensures that every 
primary input is chosen. It then finds the latches in the transitive fanout of this 
input - this set is called and must consist solely of LI latches (except for 
inputs connected to L2s, which are treated specially). It places these elements of 
one- at- a- time into the set LT. For each latch in LT not previously conside- 
red, it finds all L2 latches in the transitive fanout of LT - this set is denoted L2h 
It then looks for any latches in the transitive fan in of L2^ - these must be Lis - 
and adds them to LT. It then iteratively ping-pongs between the Lis and L2s 
for this MDL until no new latches are found. These latches are now labeled with 
their type and layer identifier (which is then incremented), and a record kept as 
to the number of LI and L2 latches in that layer. It then continues iteratively 
with the next element of the set LTh 

The algorithm then looks for LI latches in the transitive fanout of the L2s 
encountered in the previous layers. If it finds any, these new MDLs are explored 
iteratively as above until no new latches are encountered. The outer while loop 
then begins traversing from the next primary input. 

If the algorithm encounters a previously-marked LI latch while looking for an 
L2 latch (or vice-versa), it flags this violation. If no violation has been reported 
during the analysis, the netlist is DP, and reduction may proceed. 

After the above analysis, either the LI or the L2 set may be removed per layer; 
these layers are minimal by construction. A simple iteration over every MDL will 
yield optimum reductions; if the given layer has more L2s than Lis, the L2s of 
that layer should be replaced with multiplexors as discussed in Sect. 2.1. If not, 
the Lis of that layer should be replaced with wires. 

If the type of all latches is provided (LI versus L2), an alternate algorithm 
may simply iterate over each latch within the netlist, and calculate its MDL 
given its type. When this abstraction was initially deployed, no such type data 
was automatically available; the inputs provided a convenient point of reference. 

Theorem 5. This algorithm performs optimum DP reductions. 

Proof, By construction, each latch will be a member of exactly one MDL. Fur- 
thermore, the MDLs are of minimum size, resulting in a maximum number of 
dependent layers in the netlist. Since each MDL is independent of the others, 
the locally optimal solutions yield a globally optimum result. 

Note that along any input - output path within a single MDL, exactly one 
flip-flop must exist after abstraction - if zero or two exist, the bisimulation is 
clearly broken. Take any latch from any MDL which which was removed by the 
abstraction - assume that it is an LI latch LP. All L2s L2^ in the fanout of LT 
must remain. All Lis in the fanin of L2^ must have been removed, and so on until 
we are left with the case that (within this MDL) all of the Lis are removed, and 
all of the L2s remain, if this is a correct abstraction (similar reasoning applies to 
consideration of a removed L2 latch) . This demonstrates optimality of reduction 
of each MDL. 
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The algorithm may be optimized to ensure that each combinatorial gate (or 
net) is only considered once in fanout traversal, and once in fanin traversal, to 
ensure that its complexity is 0{netlist size^). However, in practice, we have 
found that the complexity of this algorithm grows roughly linearly with model 
size and takes a matter of seconds for even the largest designs we have considered 
for model checking (more than 8,000 latches). This near- linearity is not surprising 
for synthesizable high-performance netlists, since the depth of combinatorial 
logic between latches and the number of sinks of a net are restricted to ensure 
that the netlist meets timing constraints. 

5 Experimental Results 

The above algorithm was implemented into the model checker RuleBase [4], 
developed in IBM Haifa Research Lab as an extension to SMV [9]. It is utilized 
as a first-pass netlist reduction technique; the reduced full-cycle model is saved 
and used as the basis for further optimizations before being passed to SMV for 
model checking. 

This algorithm was deployed for use on many components of IBM’s Giga- 
hertz Processor. The reduction results obtained by this step are given in Table 1 
below. These numbers do not reflect the results of any other reduction techni- 
ques. We recommend, due to the speed of this algorithm (O(n^) in theory, but 
roughly 0(n) in practice) and its global preservation of L2 — visible properties, 
that it be used as a first-pass reduction technique upon design compilation. The 
resulting abstracted design may then be analyzed for formula-specific reductions 
(e.g., cone-of-influence, constant propagations, retiming), which are likely to pro- 
ceed faster upon the abstracted design due to the fewer number of latches and 
simpler transition relation (the clock is no longer in the support of the transition 
relation). 



Table 1. DP Reduction Results 



Logic Function 


State Bits 
Before Reduction 


State Bits 
After Reduction 


Load Serialization Logic 


8096 


2586 


LI Reload Logic 


3102 


1418 


Instruction Flushing Logic 


138 


69 


Instruction-Fetch Address Generation Logic 


4891 


2196 


Branch Logic 


6918 


3290 


Instruction Issue Logic 


6578 


3249 


dag Management Logic 


578 


289 


Instruction Decode Logic 


1980 


978 


Load / Store Control 


821 


409 



During the initial stages of model checking, this abstraction was not available. 
Once the abstraction became available, properties which previously took many 
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hours to complete would finish in several minutes. More encompassing properties 
became feasible on the abstracted model which would not otherwise complete. 

As a small example, a property run on the Load Serialization Logic which 
took 25.6 seconds, 36 MB of memory on the abstracted model (with 81 variables) 
took 450.2 seconds, 92 MB of memory for the unabstracted netlist (with 116 
variables) on the same machine (an IBM RS/6000 Workstation Model 590 with 2 
GB main memory), with no initial BDD order. This time includes that necessary 
to perform the netlist analysis and reduction. As a larger example, a property 
run on the Instruction Flushing Logic took 852 seconds of user time, 48 MB on 
the abstracted model (with 96 variables). This same property did not complete 
on the unabstracted netlist (with 162 variables) within 72 hours. 

While it may seem surprising that the number of variables after abstraction 
is more than half that before abstraction, this is due to two phenomena. First, 
some of these variables are used for environment and specification; these are 
modeled directly as flip-flops (rather than L1-L2 latches). Second, in some cases, 
RuleBase was able to exploit some redundancy among these variables through 
other model reduction techniques (e.g., constant simulation). 

The benefits obtained by this algorithm extend beyond a mere reduction 
in state depth, which reduces the time and memory consumed by reachability 
calculations. BDD variable reordering time is often greatly reduced (since the 
BDDs tend to be smaller, and since with less variables a “good ordering” tends 
to be faster to compute). The reduction to full-cycle models also reduces the 
number of image calculations necessary to reach a fixed-point or on-the-fly failure 
-the diameter of the model is halved. Further, since fewer state variables require 
evaluation, it is possible that the above reduction may be exploited to “collapse” 
adjacent functions to a single function, which may be represented on the same 
BDD. However, this risks blowing up the BDD size; the functions may thus 
remain distinct and implicitly conjoined [10] to ensure proper evaluation. 

With this abstraction available, as demonstrated above, model checking was 
enabled to verify much “larger” and more meaningful properties in less time. 
Users of our tool have found that writing specifications and environments for 
the full-cycle abstracted models is much less complex than for the corresponding 
half-cycle netlists (as is viewing traces). All RuleBase users quickly converted 
to running exclusively with this abstraction. There have been many hundreds of 
formulae written and model checked to date on this project, which collectively 
have exposed on the order of 200 bugs at various design stages. We have not 
encountered any properties we wished to specify which became impossible on 
the abstracted model. This algorithm thus provided an efficient and necessary 
means by which to free ourselves from the verification burdens imposed by the 
low level of the implementation. 

It is noteworthy that roughly 70 HDL bugs were isolated due to violations 
of L1-L2 connectivity during this work. While algorithms for detecting such 
problems are simple (and other tools implementing such checks became availa- 
ble later in the design cycle), the many benefits resulting from this reduction 
provided strong motivation for quickly correcting these errors. Due to the na- 
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ture of logic interpretation in simulation and model checking frameworks, the 
logic flawed in such a manner typically behaved “properly” for verification - 
these platforms assume zero combinatorial delay, but no combinatorial “flow- 
through” for two adjacent level-sensitive latches even if both are simultaneously 
in the transparent phase. 

6 Conclusions 

We have developed an efficient algorithm for identifying and abstracting dual- 
phase L1-L2 netlists. The algorithm performs netlist graph traversal, rather than 
FSM analysis, hence is CPU-efficient - O(n^) in theory, but roughly 0(n) in 
practice due to timing constraints imposed upon synthesizable netlists. The be- 
nefits obtained by the abstraction include much smaller verification time and 
memory requirements (through “shallower” state depth - often less than one- 
half that necessary without the abstraction - which reduces complexity of the 
transition relation and simplifies BDD reordering, and a halving of the diameter 
of the model), as well as more abstract specification and environment definitions. 
A bisimulation relation is established between the unreduced and reduced mo- 
dels. This reduction is optimum, and is valid for model checking CTL* formulae 
which reason solely about latches of a given phase. Experimental results from 
the deployment of this algorithm (as implemented in the model checker Rule- 
Base) upon IBM’s Gigahertz Processor are provided, and illustrate its extreme 
practical benefit. 
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Abstract. The design of control units of modern processors is quite 
complex due to many speed-up techniques like pipelining and out-of- 
order execution. The existing approaches to formal verification of pro- 
cessor designs are applicable to very high level descriptions that ignore 
timing details of control signals. In this paper, we propose an approach 
for verification of detailed design of processors. Our approach suggests 
the use of Esterel language which has rich constructs for succinct and 
modular description of control. The Esterel simulation tool Xes and ve- 
rification tools Xeve and EcTooIs can be used effectively to catch minor 
bugs as well as subtle timing errors. As an illustration, we have developed 
an Esterel implementation of DLX pipeline control and verified certain 
crucial properties. 



1 Introduction 

Modern processors employ many techniques like pipelining, branch prediction 
and out-of-order execution to enhance their performance. The design and va- 
lidation of these processors, especially their control circuitry, is a challenging 
task [6,7]. 

Formal verification techniques, emerging as a viable approach to valida- 
tion [10], are still inadequate in verification of large systems like processors. 
Recently many new techniques have been proposed specifically for processor ve- 
rification [1,7,6,9,11]. These techniques verify that the given implementation is 
equivalent to a simpler sequential model of execution, as described by the in- 
struction set architecture. But in these approaches, the implementation is at 
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a very high level of abstraction ignoring details of finer timing constraints on 
control signals. These details are to be introduced to arrive at the final imple- 
mentation that can be realized in hardware. Even if the design at the higher level 
of abstraction is proved to be equivalent to a sequential model, later refinements 
may introduce timing errors. 

The aim of this paper is to propose a verification method for detailed proces- 
sor implementations containing timing constraints of control signals. We suggest 
the use of Esterel language [3,2] and its associated verification tools for descri- 
bing the implementations and verifying their properties. Esterel has a number 
of attractive features that come in handy for our purpose. It provides a nice 
separation between data and control. It offers a rich set of high level constructs, 
like preemption, interrupts and synchronous parallelism, that are natural for 
hardware systems and that enable modular and succinct description of complex 
controllers. Besides simulation, Esterel descriptions can be rigorously verified 
using the tools Xeve [4] and EcTools [5]. Einally, Esterel programs can be direc- 
tly translated into hardware. 

In this paper we illustrate our approach by developing an Esterel model of 
the DLX pipelined processor control unit [8]. The model has been debugged 
using the simulator tool Xes and has been verified to satisfy a number of desired 
properties using the verification tools. 



2 Esterel Specification of Pipelined Control Unit 

The specification is based upon the informal description of DLX processor gi- 
ven in [8]. We confine ourselves to the control unit specification; the data path 
specification can be trivially given using a host language like C. 

2.1 The Main Controller 

The execution of an instruction in the DLX processor goes through five stages: 
Instruction Fetch (IF)^ Instruction Decode/Register Fetch (ID)^ Execution/Ef- 
fective Address Calculation (EX)^ Memory Access/Branch Completion (MEM) 
and Write-Back (WB). The introduction of pipelining leads to increased com- 
plexity in design in terms of additional registers and control logic due to various 
hazards. Pipeline registers are required to store the intermediate values produced 
by different stages. DLX uses the hranch-not-taken prediction scheme and hence 
to handle the control hazard that occurs when a branch is taken (determined in 
the EX stage), the instruction in the ID stage must be squashed; the handling 
of interrupts requires even more complex control logic. Appropriate actions like 
data forwarding or stalling have to be taken to handle data hazards^ for instance 
when an instruction updates a register or memory location that is read by a 
subsequent instruction. 

Eigure 1 gives an Esterel module that models a generic pipeline stage of the 
DLX controller. An Esterel program in general consists of one or more modules. 
Each module has an input-output interface and reactive code that is executed 
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module XXUnit : 

input GoPrev, Stall, Restart; 
output GoNext, StallPrev, RestartPrev; 

loop % execute the 'loop^ body repeatedly 
do % the ^body’ of the ^do-watching^ statement starts here 
signal Go in 7oGo is a local signal 
[ 

suspend % stop execution 

[ 

loop 

await immediate Go; %wait till the other component emits ‘Go’ 
emit GoNext; % generate the signal GoNext 
run XX; % execute the module named XX 
await tick 7o wait for one reaction 
end loop 

I I 

loop 

await immediate GoPrev; %wait till ‘GoPrev’ is present in the input 
await tick; % wait one reaction step 
emit Go 7 generate ‘Go’ signal 
end loop 

] 

when immediate Stall 7 stop execution of the ‘suspend’ body when ‘Stall’ 
7 is present 

I I 

loop 

await tick; 
await immediate Stall; 
emit StallPrev 
end 

] 

end signal 7 end of scope of local signal declaration 
watching Restart; 7 abort the ‘watching’ body when ‘Restart’ is present 
emit RestartPrev 

end loop 7 end of the outermost loop 
end module 7 end of the module 



Fig. 1. A pipeline stage in Esterel 
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periodically at the phase of the built-in signal tick. Every time a module is 
executed, it reads input signals and depending upon the state of the module ge- 
nerates appropriate output signals and changes the state. Every such execution 
is called a reaction. A reaction is assumed to be instantaneous so that there is no 
time delay between input consumption and output generation. All Esterel state- 
ments are instantaneous excepting the ‘halt’ statement which does not terminate 
at all. The control of an Esterel program resides at one or more halt statements 
(more than one when there are concurrent components) which decide the state 
of the program. A reaction, besides generating outputs, results in a change of 
state with the movement of control points from one set of halt statements to 
another. 

Esterel possesses a rich set of constructs for describing control. Here we give 
a very brief explanation of some of these constructs. The statement await S is a 
simple ‘wait construct’ that delays termination until the signal S is present in the 
input; await immediate S is a variant which can terminate even in the very first 
instant when control reaches the construct. The statement do watching stat 
S continues to execute stat as long as the signal S is not present; the moment S 
appears on the input, the whole statement terminates aborting the computation 
inside stat. The statement suspend stat till S suspends the execution of 
stat in all reactions in which S is present; execution continues where it got 
suspended when S is not present. 

Now we will describe the behavior of the module in Eigure 1. Eor the sake 
of simplicity, we have taken the tick signal to define the clock of the processor. 
Suppose the signals Stall and Restart are not present in a reaction, correspon- 
ding to the uninterrupted flow of an instruction through the pipeline stages. 
Then the submodule XX (in the first branch of the parallel operator within the 
suspend statement) is executed in the cycle when the local signal Go is present; 
the Go signal is present in this cycle provided the GoPrev signal was present in the 
previous cycle (in the second branch of the parallel operator within the suspend 
statement). At the end of execution of XX, which is assumed to be instantaneous, 
the module generates GoNext. 

Suppose that Stall is present in a cycle, representing a hazard in the pipeline 
stage XX. Then the execution of XX is suspended by the suspend statement and 
the signal GoNext is not generated; the signal StallPrev is generated (in the 
second branch of the outer parallel operator). If on the other hand the Restart 
signal is present, representing an interrupt or a taken branch, then the body of 
the outer watchdog primitive is killed and the execution is restarted because of 
the presence of the outer loop construct. This results in the loss of information 
about the presence of the GoPrev signal in the previous cycle. Also a Restart 
triggers a RestartPrev signal. 

Thus, XXUnit executes the submodule XX in every cycle in which GoPrev is 
present and generates GoNext, as long as Stall or Restart are not present. 
A Stall in a cycle suspends the execution of XX while a Restart restarts the 
execution of whole module afresh resetting its internal state, i.e., it squashes the 
execution of XX. 
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module CONTROL: 
input IssueNextInstr ; 
output InstrCompleted; 

output WritePCn : integer ,WritePCb : integer; 
inputoutput RestartO, RestartIF, RestartID, 

RestartEX, RestartMEM, RestartWB; 
inputoutput StallO, StalllF, StalllD, StallEX, StallMEM, StallWB; 

signal GoIF, GoID, GoEX, GoMEM 
in 



[ 

IssueNextInstr / GoPrev, GoIF / GoNext, 

StalllF / Stall, StallO / StallPrev, 

RestartIF / Restart, RestartO / RestartPrev] 

GoIF / GoPrev, GoID / GoNext, 

StalllD / Stall, StalllF / StallPrev, 

RestartID / Restart, RestartIF / RestartPrev] 

GoID / GoPrev, GoEX / GoNext, 

StallEX / Stall, StalllD / StallPrev, 

RestartEX / Restart, RestartID / RestartPrev] 

run MEMUnit [ signal GoEX / GoPrev, GoMEM / GoNext, 

StallMEM / Stall, StallEX / StallPrev, 
RestartMEM / Restart, RestartEX / RestartPrev] 

I I 

run WBUnit [ signal GoMEM / GoPrev, InstrCompleted / GoNext, 

StallWB / Stall, StallMEM / StallPrev, 

RestartWB / Restart, RestartMEM / RestartPrev] 

end signal 
end module 



run IFUnit [ signal 



run IDUnit [ signal 



run EXUnit [ signal 



Fig. 2. The control unit for the DLX pipeline stages 
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The Esterel module in Figure 2 models the behavior of the entire pipeline 
controller. Each pipeline stage is an instantiation of the generic module XXUnit 
given in Figure 1; for example, IFUnit is obtained from XXUnit by replacing the 
command run XX by run IF where the module IF, shown in Figure 3, describes 
the behavior of the instruction fetch stage. 

In the module CONTROL, the renaming of the Go, Stall and Restart signals 
leads to the establishment of a forward Go -chain and two reverse Stall and 
Restart-chains. When there is no Stall signal (none of StalllF,* • *,StaIIWB 
is present), the input IssueNextInstr signal triggers the execution of the five 
stages, with the execution of each stage in a cycle triggering via the Go-chain 
the execution of the next stage in the next cycle. When StallXX is present, it 
stalls the pipeline up to stage XX; this is achieved by the instantaneous trans- 
mission of the various Stall signals to the preceding stages via the Stall-chain. 
The succeeding stages are not affected by this stall. Similarly, a Restart signal 
triggers the restart of all the earlier stages up to the current stage using the 
Restart-chain. 



2.2 The Pipeline Stages 

The Esterel specification of the various pipe stages which instantiate XX in Fi- 
gure 1 can now be described. Because of space constraints, we describe only the 
IF and EX stages. 



module IF : 

input ReadPC : integer, Branch! aken; 
output WritePCn: integer, If Out : integer; 
function Fetchinstr (integer) : integer; 
function IncrPC (integer) : integer; 

emit If Out (Fetchinstr (?ReadPC) ) ; 

present Branch! aken 

else 

emit WritePCn(IncrPC(?ReadPC) ) 
end present ; 
end module 



Fig. 3. IF Stage 



The module IF in Figure 3 emits a signal If Out with a value representing the 
current instruction and a signal WritePCn whose value indicates the new value of 
PC. The signal Branch! aken indicates a taken branch, and the IF stage writes 
a PC value only if this signal is absent, indicating a normal flow of execution. If 
the Branch! aken signal is present the PC value is written by the EX stage, shown 
in Figure 4, through a signal called WritePCb to indicate a branch in instruction 
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execution. The external functions Fet chins tr and IncrPC abstract the actions 
corresponding to fetching an instruction and incrementing the PC. 



module EX: 

input BranchTaken, Bypass, MemlnAdr : integer , MemInVal : integer, 
ExInOpcode : integer, ExInOpnd : integer; 
output ExOutAdr : integer, ExOutVal : integer, WritePCb : integer ; 
function AluOpAdr (integer, integer) : integer; 
function AluOpVal (integer, integer) : integer; 



present Bypass then 

emit ExOutAdr (AluOpAdr ( ?ExInOpcode , 
emit ExOutVal (AluOpVal ( ?ExInOpcode , 
else 

emit ExOutAdr (AluOpAdr ( ?ExInOpcode , 
emit ExOutVal (AluOpVal ( ?ExInOpcode , 
end present ; 



?MemInVal)) ; 
?MemInVal)) 

?ExIn0pnd) ) ; 
?ExIn0pnd) ) 



present BranchTaken then 

emit WritePCb (AluOpAdr (?ExInOpcode , ?ExIn0pnd) ) 
end present 
end module 



Fig. 4. EX Stage 



The module EX in Figure 4 emits two signals ExOutAdr and ExOutVal, cor- 
responding to the address and value computed by the ALU by operations ab- 
stracted by the external functions AluOpAdr and AluOpVal. The presence of the 
input signal Bypass indicates that there is a data hazard and hence that the 
inputs to ALU are to be taken through a forwarding process from the output 
of the EX/MEM pipe stage; in the absence of this signal, the inputs come from 
the ID/EX pipe stage. The BranchTaken signal indicates a taken branch and 
triggers the signal WritePCb which writes the new branch address into PC. 

The above Esterel model of the DLX processor has abstracted away details 
about the data path, instruction decoding, alternative actions based on various 
types of instructions (such as load/store) and hazard detection. This is the reason 
that the signals Bypass, Restart, BranchTaken and Stall have been modeled 
as external input signals, rather than being generated internally (by hazard 
detection units). 

3 Validation Using Esterel Tools 

In this section we outline the validation of the design of the DLX processor 
control unit using the Esterel simulation tool Xes and verification tools Xeve and 
FcTools. We focus on the micro-properties of the control unit, such as smooth 
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flow of instructions through the pipeline, absence of deadlock, proper issuing of 
stall and restart instructions, and correct behavior of the pipeline with respect to 
these signals. We are able to verify that for example, in case of a taken branch 
(determined in the EX stage) the instruction following the branch (in its ID 
stage) is restarted or aborted. Similarly, we can verify that a stall signal sent to 
some stage propagates as a bubble through the pipeline. 

The properties verified by us are finer than the macro-property verified in 
[7], namely that the pipelined machine has the same effect on visible state as 
the sequential one for the same input. The latter property, in its full glory, 
cannot be verified using existing Esterel tools because they deal with only control 
states. However, the property restricted to control states is still verifiable (see 
the paragraph titled Stall in Section 3.1). 



3.1 Verification 

The simple properties of the DLX pipeline controller mentioned above can be 
verified using the Esterel tools Xeve [4] and EcTools [5]. They are verification 
environments for Esterel programs modeled as finite state machines (ESMs) with 
a user-friendly graphical interface. 

The Esterel compiler generates ESMs implicitly in the form of boolean equa- 
tions with latches. One of the verification tasks performed by Xeve is to take an 
implicit ESM and perform a state minimization using the notion of bisimulation 
equivalence. Before minimization a set of input /output signals can be hidden. 
This results in a nondeterministic ESM where some transitions may be labeled 
by r, a hidden internal action. Xeve generates minimized ESMs, that can be fur- 
ther reduced using some abstraction criterion by EcTools and can be graphically 
explored using the tool ATG. 

EcTools is a verification tool set for networks of communicating ESMs. Its 
capabilities include graphical depiction of automata, reduction of automata and 
verification of simple modal properties by observers, counterexample production 
and visualization. 

In our verification process the original ESM produced by Xeve had about 
1500 states, which after making some irrelevant interface signals local got re- 
duced to 543 reachable states. This was reduced to 16 states and 72 transitions 
after applying the observational equivalence minimization procedure available in 
EcTools. Still the automaton could not be inspected due to the large number of 
transitions. So we used the powerful abstraction technique available in EcTools 
to further reduce the size of the automaton. An abstraction criterion defines a 
new set of action symbols that are regular expressions on the action symbols 
in the original automaton. The reduction involves abstraction of sequences of 
old actions into new actions so that the reduced automaton contains only new 
action symbols; further, certain paths in the original automaton are eliminated, 
thereby resulting in a small automaton that can be checked easily. 

Depending upon the property to be checked, we applied different criteria to 
get small automata which we verified with respect to appropriate properties. 
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Criterion 


States 


Transitions 


Initial 


16 


72 


Smooth Flow 


8 


12 


Stall 


16 


32 


Branch 


1 


1 



Table 1. Sizes of Reduced Automata 



Table 1 summarizes the sizes of the various reduced automata obtained for dif- 
ferent criteria. The details about the criteria ‘Smooth Flow’ and ‘Stall’ are given 
below. The criterion ‘Branch’ checks for proper updation of the PC value at any 
cycle by abstracting paths into two abstract actions ‘success’ and ‘failure’. The 
reduced automaton has only one transition with the label ‘success’. 




Fig. 5. Abstraction Criterion for Smooth Flow 



Smooth flow of instructions This criterion verifies that every instruction 
issued is completed after four cycles in the absence of stalls and branches. The 
criterion depicted in Figure 5, defines four abstract actions pipe, pipec, pipebr 
and pipecbr which rename the edges satisfying the corresponding regular ex- 
pressions, eg., pipebr renames any edge in which a branch has been taken and 
no instruction is completed; in the regular expressions, . denotes synchronous 
product of input and output events (prefixed by ? and ! respectively) and their 





Validation of Pipelined Processor Designs Using Esterel Tools 



93 



pipec 




Fig. 6. Reduced Automaton for Smooth Flow 



negations (prefixed by '"); the event * matches any event. Figure 6 gives the 
reduced automaton which can be verified with respect to the desired property 
by inspection. 

For the sake of clarity in the figures, the signals StalllF, IssueNextInstr, 
and InstrCompleted of the original automaton are renamed as S, I and IC 
respectively; further the WritePCb signal is treated as being synonymous with 
BranchTaken for technical reasons. 



Stall The property verified here is that the StalllF signal stalls the IF stage for 
a cycle: no instruction is completed four cycles after a StalllF assuming later 
stages are not stalled or squashed in the intervening period. The abstraction 
criterion for this is shown in Figure 7 and the reduced automaton in Figure 8. In 
the reduced automaton there is no path of length five starting with a stall or 
a stallc that ends with a ic or stallc edge. Another interesting thing to note 
from this automaton is that from every state there is a sequence of ‘stalls’ that 
leads to the initial state; this property corresponds to the sequential equivalence 
property of [7] for control states. 
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stallc 




Fig. 7. Abstraction Criterion for Stall 



4 Conclusion 

We have proposed the use of Esterel language and tools for verification of modern 
processors. Esterel can be used to describe, in sufficient detail and in a modular 
and succinct way, control units of processors using its rich set of constructs. 
Complex timing properties of Esterel descriptions can be verified using powerful 
tools. 

We have illustrated the use of Esterel tools for the description of DLX pro- 
cessor. The initial results are encouraging. The verification tools Xes, Xeve and 
EcTools were found to be quite useful in detecting anomalies ranging from sim- 
ple bugs to complex timing errors. We plan to extend our investigation to more 
complex processors involving superscalar features like out-of-order executions. 
We also plan to investigate, in greater detail, the relative merits of Esterel for 
describing control units of processors with respect to the traditional HDLs. 
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Abstract. The ABR conformance protocol is a real-time program deve- 
loped at France Telecom, that controls dataflow rates on ATM networks. 
A crucial part of this protocol is the dynamical computation of the ex- 
pected rate of data cell emission. We present here a modelization of the 
corresponding program, using parametric timed automata. In this fra- 
mework, a fundamental property of the service provided by the protocol 
to the user is expressed as a reachability problem. The tool HyTech is 
then used for computing the set of reachable states of the model, and 
automatically proving the property. This case study gives additional evi- 
dence of the importance of the model of parametric timed automata and 
the practical usefulness of symbolic analysis tools. 



1 Introduction 

Over the last few years, an extensive amount of research has been devoted to 
the formal verification of real-time concurrent systems. Among the various ap- 
proaches to the analysis of timed models, one of the most successful is based on 
timed automata. Since its first introduction in [3], this model was extended with 
many different features, leading to the general notion of hybrid automata [1,2, 
15]. Although hybrid automata have an infinite number of states, the fixpoint 
computation of reachable states often terminates in practice, thus allowing the 
verification of “safety” properties. This explains the increasing success of the 
development of tools for the analysis of real-time systems [5,8,12], as well as 
the numerous industrial case studies which have already been presented. In this 
paper, we propose an automated verification of correctness for the Available Bit 
Rate (ABR) conformance protocol, developed by France Telecom at CNET (Cen- 
tre National d’Etudes des Telecommunications, Lannion, France) in the context 
of network communications with Asynchronous Transfer Mode (ATM). 



The ABR conformance protocol. ATM is a flexible packet-switching net- 
work architecture, where several communications can be multiplexed over a 

^ Supported by Action FORMA (Programme DSP-STTC/CNRS/MENRT) 
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same physical link, thus providing better performances than traditional circuit- 
switching networks. Different types of ATM connections are possible at the same 
time, according to the dataflow rate asked (and paid) for by the service user [9]. 
A contract with ABR connection makes it possible for a source to emit at any 
time with a rate depending on the load of the network: according to the avai- 
lable bandwidth, the ABR protocol dynamically computes the highest possible 
dataflow rate and sends this information, via so called Resource Management 
(RM) cells, to the user, who has to adapt his transfer rate of data (D) cells. 

The service provider has to control the conformance of emission with respect 
to the currently allowed rate, and filter out D cells emitted at an excessive rate. 
This is achieved by a program located at an interface between the user and the 
network, which receives RM cells on their way to the user as well as D cells from 
the user to the network (see Figure 1). This program has two parts: the easy task 




Fig. 1. Schematic view of cells traffic 



is to compare the emission rate of D cells with the rate value currently allowed, 
while the difficult problem is to dynamically compute (or update) the rate values 
expected for future D cells. The program must take into account the delays 
introduced by the transit of cells from the interface to the user and back, but 
the exact value of this delay is not known: only lower and upper bounds, denoted 
a and 6, are given. A simple algorithm called I computes, from a sequence of 
rate values carried by previously arrived RM cells, the ideal (expected) rate Et 
for emission of D cells which will arrive at future time t. However, since the 
value of t is not known in advance, an implementation of I would require to 
store a huge number of rate values. A more realistic algorithm, called ^ due to 
C. Rabadan, has been adopted by CNET. It stores only two RM cell rates, and 
dynamically updates an estimated value A of ideal rate Et, 



Correctness of program Before being accepted as an international stan- 
dard (norm ITU 1-371.1), this protocol had to be proved correct: it was necessary 
to ensure that the flow control of D cells by comparison with A rather than Et 
is never disadvantageous to the user. This means that when some D cell arrives 
at time t, A is an upper approximation of Et, In other words, A > Et when 
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current time reaches t. This property U was proved by hand by Mon in and Klay, 
using a classical method of invariants [14]. However, since this proof was quite 
difficult to obtain, CNET felt the need for using formal methods and tools to 
verify in a more mechanical way, as well as future versions of currently 
under development. 

This paper presents a modelization of algorithms X and E^ as parametric 
timed automata [4], and an automated proof of property U (viewed as a reach- 
ability problem) via tool HyTech [12]. 



Plan of the paper. Section 2 presents the model of parametric timed automata. 
Section 3 describes algorithms X and & ^ and correctness property U within this 
framework. Section 4 gives the experimental results obtained with HyTech and 
a comparison with previous work. Section 5 concludes with final remarks. 



2 Parametric Timed Automata 

We use here a model of parametric timed automata, called p-automata for short, 
which are extensions of timed automata [3] with parameters. A minor difference 
with the classical parametric model of Alur-Henzinger-Vardi [4] is that we have 
only one clock variable S and several “discrete” variables tci, while, in [4], 

there are several clocks and no discrete variable. One can retrieve (a close variant 
of) Alur-Henzinger-Vardi parametric timed automata by changing our discrete 
variable Wi into S — Wi (see [10]). Alternatively, our parametric automata can 
simply be viewed as particular cases of linear hybrid automata [1,2,15]. 



P-automata. In addition to a finite set of locations, p-automata have a finite 
set F of parameters^ a finite set W of discrete variables and a universal clock S, 
These are all real- valued variables which differ only in the way they evolve when 
time increases. Parameter values are fixed by an initial constraint and never 
evolve later on. Discrete variables values do not evolve either, but they may be 
changed through instantaneous updates. A universal clock is a variable whose 
value increases uniformly with time (without reset). 

Formally, a parametric term is an expression of the form w T ^^i=iPi + c, 

^ T 'l2i=iPi + c or '^^i^iPi + c, where w G IT, Pi E F and c G N. (As usual, 
by convention, a term without parameter corresponds to the case where A: = 0.) 
An atomic constraint is an expression ter mister m 2 ^ where termi,term 2 are 
parametric terms and ^ G {<,<,=,>,>}• A constraint is a conjunction of 
atomic constraints. The formulas used in p-automata are location invariants^ 
guards and update relations. A location invariant is a conjunction of atomic 
constraints. A guard is a conjunction of atomic constraints with possibly the 
special expression asap. An update relation is a conjunction of formulas of the 
form w'jj^terrn where belongs to a primed copy of IT, term is a parametric 
term and ^ G As usual = w is implicit if vX does not appear 
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in the update relation. 

A p-automaton Al is a tuple {L^iinit: /, V, T), where 

- L is a finite set of locations, with initial location iinu G A, 

- P and W are respectively the sets of parameters and discrete variables, S is 
the universal clock, 

- i is a mapping that labels each location £ m L with some location invariant, 
simply written instead of 1 {£) in the following, 

- S is a finite set of labels partitioned into synchronization labels and internal 
labels, 

- 7 As a set of action transitions of the form (^, cp, a, 6 ^, V), where £ and P belong 
to L, (/p is a guard, a G is a label and 0 an update relation. The transition is 
urgent if its guard ip contains the symbol asap. 



Semantics of p-automata. We briefly and informally recall the semantics of 
timed automata (see [ 4 ] for details), described in terms of transition systems. 
For a p-automaton Al, the (global) state space of the transition system is the 
set Qji = L X X x R of tuples (^,7,u,s), where £ is a location of Al, 
7 : 1-^ R is a parameter valuation^ v : W i-^ R is a data valuation and 

s is a real value of the clock A. A region is a subset of states of the form 
{(^, 7,u,s) I ip holds for (7,u,s)}, for some location £ and some constraint cp, 
written £ x ip. 

The set Qinit of initial states is the region £ina x ipinit^ for some constraint 
ipinifX the automaton starts in its initial location, with some given initial con- 
straint. (From this point on, the parameter values are not modified.) 

A state q = (^, 7,u, s) is urgent if there exists some action transition e, with 
source location £ and a guard of the form ipAasap^ such that ip holds for (7, u, s): 
some urgent transition is enabled. From a non urgent state q = (^, 7,u,s), the 
automaton can spend some time £ > 0 in a location £, providing the invariant 
remains true. This delay move results in state q' = (^,7,1;, s T s) (nothing else 
is changed during this time). Since location invariants are convex formulas, if 
is satisfied for s and s T then it is also satisfied for any o;, 0 < a < £. 

From a state q = (^, 7, u, s), the automaton can also apply some action tran- 
sition (^, (p, a, 6^, £J)^ providing guard ip is true for the current valuations (7, u, s). 
In an instantaneous action rnove^ the valuation of discrete variables is modified 
from V to according to update relation 0 and the automaton switches to target 
location P ^ resulting in state c/ = [P ^ s). 

A successor of a state q is a state obtained either by a delay or an action 
move. For a subset Q of states, FosP[Q) is the set of iterated successors of the 
states in Q. Similarly, the notions of predecessor and set Pre*(Q) can be defined. 



Synchronized product of p-automata. Let Ali and AI2 be two p-automata 
with a common universal clock A. The synchronized product (or parallel compo- 
sition, see e.g. [ 12 ]) Ali x AI2 is a p-automaton with S as universal clock and the 
union of sets of parameters (resp. discrete variables) of Ali and AI2 as sets of pa- 
rameters (resp. discrete variables). Locations of the product are pairs (^1,^2) of 
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locations from Ai and A 2 respectively. Constraints associated with locations (in- 
variants, initial constraint) are obtained by the conjunction of the components 
constraints. The automata move independently, except when transitions from 
Ai and A 2 have a common synchronization label. In this case, both automata 
perform a synchronous action move, the associated guard (resp. update relation) 
being the conjunction of both guards (resp. update relations). For simplicity we 
suppose here that synchronized transitions are non urgent. 



Parametric verification. For a given automaton. Post* {Qinn) represents the 
set of reachable states. For p-automata, we have the following closure property: 
if Q is finite union of regions, also called zone, then the successor of Q is also 
a zone. Hence, the output of the computation of PosP {Qinit) (if it terminates) 
is a zone. Consider now some property C, such that the set of states violating 
U can be characterized by a zone Q^u • Proving that U holds for the system 
reduces to prove the emptiness of zone PosP^Qinu) H Alternatively it 

suffices to prove: Pre*(Q^[/) FI Qinu = 0- Note that we are interested here in 
proving that property U holds for all the valuations of parameters satisfying the 
initial constraint. The problem is known to be undecidable in general [4]: there 
is no guarantee of termination for the computation of Post* (or Pre*). 

3 Description and Modelization of the System 

Recall that algorithms I and use rate values carried by RM cells to dyna- 
mically compute the rate expected by the network for the conformance test of 
future D cells. In order to verify the correctness of with respect to 2, we 
introduce a snapshot action taking place at an arbitrary time t, which will be a 
parameter of the model. For our purpose of verification, it is enough to consider 
the snapshot as a final action of the system. 

We first give p-automata as models for the environment and algorithms 2 
and Bh Then, in the complete system obtained as a synchronized product of 
the three automata, we explain how to check the correctness property. All these 
p-automata share a universal clock S, the value of which is the current time s. 
Without loss of understanding (context will make it clear), we will often use S 
instead of s. 



3.1 A Model of Environment and Observation 

The p- automat on Aenv modeling environment (see Figure 2) involves the para- 
meter t (snapshot time) and a discrete variable R representing the rate value 
carried by the last received RM cell. In the initial location Wait, a loop with 
label newRM simulates the reception of a new RM cell: the rate R is updated 
to a non deterministic value (R^ > O). The snapshot action has S=t as a guard, 
and location Wait is assigned invariant S < t in order to “force” the switch to 
location EndE. 
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Fig. 2. Automaton A&nv modeling arrivals of RM cells and snapshot 



3.2 Algorithm X 

Definition of Ideal Rate. As already mentioned, transmissions are not in- 
stantaneous and parameters a and b represent respectively a lower and an upper 
bound of the delay. Recall that s is the current time and t the date of the snaps- 
hot. An RM cell received from the network is relevant to the computation of the 
“ideal rate” only if it has been received before s and (1) either it is the last recei- 
ved before or at time t— 6, or (2) it arrived inside the time interval ]t— 6, t— a] . The 
ideal rate Et{s)^ estimated at current time s for time t, is the highest value among 
these RM cells. In other words, if n > 0 and ro, ri, . . . , are the successive arri- 
val times (before s) of RM cells, such that vq < t — b < ri < V 2 ^ ^ <t — a^ 

and if Ri^ . . . ^ R^ are the corresponding rate values, then the expected rate 
is = Max{Ri^0 < i < n}. The case where n = 0 is obtained when no new 

RM cell arrived between t — b and t — a. Note that in [14], RM cell arrival times 
ri, r 2 , . . . , are additionally assumed to form a strictly increasing sequence (see 
section 4.2). 

Incremental algorithm X. The following algorithm I proceeds in an incre- 
mental way, by updating a variable E at each reception of an RM cell, until 
current time s becomes equal to t. It is easy to see that, at this time, the value 
of E is equal to the ideal rate Et{s) defined above. More precisely, algorithm X 
involves variable R and parameter t (in common with Aenv) and, in addition: 
-the two parameters a and b (representing the lower and upper bounds of the 
transit time from the interface to the user and back), 

- the specific variable E (which will be equal to the ideal rate Et{s) when the 
value of the universal clock S reaches t). 

Initially, E and R are equal. Algorithm X reacts to each arrival of a new RM cell 
with rate value R by updating E. There are three cases, according to the position 
of its arrival time S with respect to t-b and t-a: 

1. If S < t-b (case n = 0 above), E is updated to the new value of R: 

[II] if t >= S+b then E^= R 

2. If t-b < S < t-a, the new ideal rate becomes E’ =Max(E,R) (from the 
definition and the associativity of Max). To avoid using function Max^ this 
computation is split into two subcases: 
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[I2a] if S+a <= t < S+b and E < R then E^= R 
[I2b] if S+a <= t < S+b and E >= R then E^= E 

3. If S > t-a, the rate E is left unchanged: 

[13] if t < S+a then E^= E 

Algorithm X terminates when the snapshot takes place (S=t). 

Remark. A program of conformance control based on I would need to store at 
each instant s all the rate values of the RM cells received during interval ]s — h^ s], 
which may be in huge number on an ATM network with large bandwidth. 



rieivRM 




[II] 

=R 



Fig. 3. Automaton Aj 



Automaton Algorithm I is naturally modeled as p- automat on Ax (see 

Figure 3). Initial location is Idle^ with initial constraint E = R. The reception 
of an RM cell is modeled as a transition newRM from location Idle to location 
UpdE. This transition is followed by an urgent [asap) transition from UpdE 
back to Idle^ which updates E depending on the position of S w.r.t. t-b and 
t-a, as explained above. Without loss of understanding, transitions from UpdE 
to Idle are labeled [II] , [I2a] , [I2b] , [13] as the corresponding operations. 
Observation of the value E corresponds to the transition snapshot from Idle to 
final location EndE 
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3.3 Algorithm Computation of an Approximation 

Like 2, algorithm involves parameters a and b (but not t), and variable R. In 
addition, it has six specific variables: 

- tf i and tla, which play the role of fi-rst and la-st deadline respectively, 

- ACR, for “Approximate Current Rate” , which corresponds to A(s), 

- FR, for “First Rate”, which is the value taken by ACR when current time S 
reaches tfi, 

- LR, for “Last Rate”, which is the value taken by ACR when current time S 
reaches tla. It stores the rate value R carried by the last received RM cell. 

- Emx is just a convenient additional variable, intended to be equal to Max(FR, 
LR). 

Initially, S=tf i=tla, and the other variables are all equal. Algorithm reacts 
to two types of events: “receiving an RM cell” and “reaching tfi”. 

Receiving an RM cell. When, at current time S, a new RM cell with value R 
arrives, the variables are updated according to the relative positions of S+a 
and S+b with respect to tfi and tla, and those of R with respect to Emx 
and ACR. Among the eight cases (from [1] to [8] ), we omit operations [1] 
to [5] for lack of space, but they are similar to [6] : 

[6] if S < tfi and Emx > R and R >= LR then 

LR^ = R, FR^ = Emx. 

[7] if S >= tfi and ACR <= R then 

LR^ = R, FR^ = R, Emx^= R, tfi^= S+a, tla^= S+a. 

[8] if S >= tfi and ACR > R then 

LR^ = R, FR^ = R, Emx^ = R, tfi^ = S+b, tla^ = S+b. 

Reaching tfi. When the current time S becomes equal to tfi, the approxi- 
mate current rate ACR is updated to FR while FR is updated to LR. Moreover, 
tfi is updated to tla. There are two cases depending on whether tfi was 
previously equal to tla (operation [9a]) or not (operation [9b]). In the 
first case, current time S will go beyond tfi (= tla), while in the second 
case, S will stay beneath the updated value tla of tfi. We have: 

[9a] if tfi = tla then 

ACR^ = FR, FR^ = LR, Emx^ = LR. 

[9b] if tfi < tla then 

ACR^ = FR, tfi^ = tla, FR^ = LR, Emx^ = LR. 

When the events “reaching tfi” (S=tfi) and “receiving an RM cell” si- 
multaneously occur, operation [9a] (case tf i=tla) or [9b] (case tf i<tla) 
must be performed before operation [1] , . . . , [8] (accounting for the RM 
cell reception). 

Like 2, algorithm E^ terminates at snapshot time (S=t). If the snapshot occurs 
simultaneously with reaching tfi, operation [9a] or [9b] must be performed 
before termination of & . 
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Automaton * Algorithm is modeled as p-automaton represented in 
Figure 4 with only the most significant guards and no update information. Like 
before, the same labels are used for automaton transitions and corresponding 
program operations. 




Fig. 4. Approximation automaton Ab' 



Event “reaching tfi” (S=tf i) is simulated by introducing two locations Less 
and Greater in Ab'. Initially Ab' is in Greater^ with constraint: S=tfi=tla 
A ACR=FR=LR=Emx=R. Location Less has S<tfi as an invariant, in order to 
force execution of transition [9b] (if tfi<tla) or [9a] (if tfi=tla) when S 
reaches tf i. From Less, transition [9b] goes back to Less (since, after update, 
S<tfi=tla) while transition [9a] switches to Greater (since S>tfi=tla as time 
increases). 

The reception of an RM cell corresponds to a transition newRM. There 
are two cases depending on whether the source location is Less or Greater. 
From Less (resp. Greater)^ transition newRM goes to location LfpdAL (resp. 
UpdAG). This transition is followed by an urgent transition from LfpdAL (resp. 
UpdAG) back to Less^ which updates the discrete variables according to ope- 
rations [1] , . . . , [6] (resp. [7] , [8] ), as explained above. Note that transition 
newRM from Less to LfpdAL has an additional guard S<tf i in order to prevent 
an execution of newRM before [9a] or [9b] when S=tfi (which is forbidden 
when “reaching tfi” and newRM occur simultaneously). 

Like before, observation is modeled as a transition snapshot from location 
Less or Greater to EndB. Also note that transition snapshot from Less to 
EndB has guard S<tfi in order to prevent its execution before [9a] or [9b] 
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when S=tfi (which is forbidden when “reaching tfi” and the snapshot occur 
simultaneously) . 

3.4 Synchronized Product and Property U 

The complete system is obtained by the product automaton T = Aenv x A± x 
Ab^ of the three p- automat a above, synchronized by the labels newRM and 
snapshot. In order to mechanically prove property V, we have to compute Post* 
for the product automaton T, starting from its initial region 
Qinit = {Wait, Idle, Greater) x cpinit, 
where pinit is the constraint S=tf i=tla A R=E=ACR=FR=LR=Emx A 0<a<b. 

We then have to check that Post* [Qinu) does not contain any state where the 
property U is violated. Recall that property U expresses in terms of the ideal 
rate Et{s) computed by I, and the approximate value ^(s) computed by 
by: For all t, when s reaches t, ^(s) > In our model T, E corresponds 

to Pt{s)y ACR to ^(s) and snapshot (at S=t) makes the automaton switch to its 
final state, hence property U translates as: 

when T is in location [EndE ^ Endl ^ EndB) ^ ACR > E. 

The set of states where U does not hold is therefore the region 
Q^u = {EndE,EndI,EndB) x ACR<E. 

As explained in Section 2, we have to check Post*(Q^mt) hi Q^u = 0 or, alter- 
natively, Fre*{Q^u) n Qinit = 0- 

4 Verification of Correctness 

4.1 Verification with HyTech 

Automata Aenv^ A± and Ab^ can be directly implemented into HyTech [ 12 ], 
which automatically computes the synchronization product T . The forward com- 
putation of Post*[Qir^it) requires 23 iteration steps and its intersection with Q^u 
is checked to be empty. This takes 487 sec. on a SUN station ULTRA-1 with 
64 Megabytes of RAM memory. Alternatively, the backward computation of 
Pre*[Q^u) requires 15 iteration steps and its intersection with Qinit is checked 
to be empty in 90 sec. The automated proof of correctness of is thus achie- 
ved. Recall that these automata (with 3 parameters a, b and t) belong to an 
undecidable class [4], so termination was not guaranteed a priori. 

4.2 Compsirison with Previous Work 

Verification at CNET. Ideal rate algorithm 2 and correctness property U 
(S=t^ E< ACR) have been formalized by J.-F. Mon in and F. Klay at CNET. 
In [14], they give the first manual proof of U, using the classical method of in- 
variants. They first split U into a conjunction of two properties: 

Ui :tfi<S<t^E< ACR and IJ 2 :S<t<tfi^E< ACR. 

The proof of Ui AU 2 is then done in two steps. First, U 1 AU 2 is in turn strengthe- 
ned into V = UiAU2AUsA---AUio^ where U 3 , . . . , Uio are nontrivial auxiliary 
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properties of Second , V is proved to be an invariant (true initially and remai- 

ning true after each event). The invariance proof for V has been mechanically 
checked with the proof assistant COQ [13]. The auxiliary properties . . . , 
can be seen as “lemmas” necessary to achieve the proof of Ui AU 2 by (fixpoint) 
induction. 

With respect to our approach, property V can be seen as a fixpoint of Post 
and, as such, is an overall approximation of Post* (since Post* is the least 
fixpoint). The main advantage of our approach, is that no auxiliary property 
(“lemma”) such as t/3, • • • , t/10 has to be manually discovered: U is mechanically 
verified in its original form. Note that one of the Ui (tf i= tla ^ FR=LR) is not 
true in our model but should be replaced by tfi= tla ^ FR > LR. This is a 
consequence of the slightly more general hypothesis Vi < V 2 < • • • ^ instead 
of ri < r2 < • • • < Vn. Another advantage here is that Post* characterizes all the 
properties of the system, and not only U. Therefore Post* can be immediately 
reused for proving any other property P of the system by testing that Post* 
does not contain any state violating P. Finally our modelization is likely to be 
reusable for modeling and verifying enhanced versions of ^ which are currently 
under development at CNET. 



Verification with GAP. In [10], we achieved a first mechanical proof of U by 
encoding the successor relation of the system as a logic program with arithmetical 
constraints, and computing a fixed-point of the program through the bottom-up 
evaluation procedure of Revesz [16]. The encoding required an approximation 
of the successor relation, so that only an upper approximation of Post* was 
generated. Nevertheless this approximation was sufficient to prove t/, because it 
did not contain any state violating U . 

With respect to that approach, we used here HyTech [12], a sophisticated 
and widely spread analysis tool for hybrid systems [2], rather than GAP, a spe- 
cific prototype implementation of Revesz’s procedure [11]. Therefore our results 
are now easily reproducible. Besides, with respect to GAP, we reach an exact 
fixed-point rather than an approximation, and the execution time is much (ab- 
out 10 times) faster. On the other hand, termination of fixpoint computation 
was guaranteed with GAP by Revesz ’s decidability result. 

5 Final Remarks 

Our modelization is a direct translation without any simplification of the real 
algorithm B^ described in the international norm ITU 1-371. We automatically 
proved the basic correctness property U of algorithm B^ using HyTech [12] (the 
full HyTech code is given in [6]). The proof is parametric in the sense that U 
holds for all values of the two parameters a and b (with 0 < a < 6) involved 
by Bh A third parameter t was used for specifying property U itself. Such a 
proof is a priori impossible to do with other analysis tools of real-time systems 
such as UPPAAL [5] or KRONOS [8] due to this use of parameters. Our ana- 
lysis contributes to improve the comprehension of the correctness proof for the 
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ABR conformance protocol, in particular in relaxing some unnecessary assump- 
tions. It paves the way for the verification of enhanced versions of currently 
under development at CNET. This case study gives additional evidence of the 
importance of (variants of) parametric timed automata [4] as a means for mode- 
ling and analysing real industrial applications. Other successful verifications of 
parametric concurrent systems using HyTech can be found in [7]. 
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Abstract. Mo del- checking and testing are different activities, at least 
conceptually. While model-checking consists in comparing two specifi- 
cations at different abstraction levels, testing consists in trying to find 
errors or gain some confidence in the correctness of an implementation 
with respect to a specification by the execution of test cases. Neverthe- 
less, there are also similarities in models and algorithms. We argue for 
this by giving a new on-the-fly test generation algorithm which is an 
adaptation of a classical graph algorithm which also serves as a basis 
of some model-checking algorithms. This algorithm is the Tarjan’s algo- 
rithm which computes the strongly connected components of a digraph. 



1 Introduction 

Conformance testing aims at applying test cases to an implementation under 
test {lUT) in order to detect errors or increase ones confidence in the fact that 
the lUT is correct with respect to its specification, ft is a black box testing: 
the source of the lUT is unknown but its behavior is known by its interactions 
with the environment. Conformance testing is applied in several domains and 
especially in the domain of protocols where its activity is standardized but not 
well formalized by [1]. [16] partly bridges this gap by defining a formal framework 
but does not instantiate it into a precise test generation algorithm. 

Nevertheless, a lot of theoretical work has been done on test generation al- 
gorithms. Some syntactical methods exist but we will limit our discussion to 
semantical ones. Semantical methods can be divided into two classes which dif- 
fer on the models, theories and algorithms. Techniques based on automata theory 
(see e.g. [20] for a survey) use Mealy machines (automata with each transition 
labelled with an input and an output) as models. They theoretically have a pow- 
erful fault coverage but make strong asumptions on specifications and lUT and 
are limited to small specifications. The other class of semantical techniques uses 
the model of labelled transition systems (LTS). They stem from fundamental 
studies on testing theory [8,2,5]. Originally defined for general LTS^ their ap- 
plicability was not clear. But they were the starting points for more realistic 
theories based on LTS with differentiated input and output transitions named 
lOSM^ lOTS or lOLTS [25,21]. The central point is a conformance relation 
relating specifications to correct implementations. These methods at least insure 
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unbias (only non conformant implementations can be rejected by a test case) and 
“theoretical” exhaustiveness (under some assumptions on implementations, all 
non conformant implementations can be rejected by a test suite). 

In [12] we proposed a first on-the-fly test generation algorithm and we com- 
pleted the picture in [17]. These algorithms have been implemented in our pro- 
totype tool TGV and gave good results on industrial experiments [13]. The main 
algorithm was based on a traversal of the synchronous product of a test purpose 
automaton with an lOLTS representing the observable behavior of the speci- 
fication. We thought that test cases should be acyclic in order to ensure the 
finiteness of their execution on the so the algorithm was cutting loops. But 

test practitioners and standardized test suite showed us that this was not always 
the case. It is the reason why we started to investigate a way for producing test 
cases with loops. 

Some model- checking algorithms for CTL [6] or LTL [23], in particular local 
or on-the-fly ones also have to tackle with loops. Some of these algorithms (see 
e.g. [7,26]) are adaptations of the classical Tarjan’s algorithm which computes 
strongly connected components (SCCs) of a digraph during a depth first search 
(DFS). For on-the-fly model-checking, this algorithm has the advantage to pro- 
vide a diagnostic sequence in the stack as soon as a violation of the property is 
detected. This facility has been used for test generation [10] as the negation of 
the checked property can be seen as a test selection criterion, i.e. a test purpose. 

In our opinion, this is not sufficient as diagnostic sequences have to be fur- 
ther transformed into test sequences by taking into account output freedom of 
the specification, thus giving test sequences with possibly a lot of Inconclusive 
verdicts These verdicts should be reduced to the minimum in generating more 
adaptative test cases. We believe that test generation can benefit from model- 
checking algorithms but they need some adaptations to the testing framework. 
We present here such algorithms. 

The paper is organized as follows. We first present in Section 2 the models 
used for test generation. Section 3 then gives an iterative formulation of the 
Tarjan’s algorithm and present it as a framework for the derivation of all other 
algorithms presented in the paper. The three subsequent sections present instan- 
tiations of this framework for test generation. Section 4 describes an algorithm 
computing the subgraph of all sequences leading to Accept states of the test pur- 
pose and can be seen as a complete diagnosis for the CTL property AG^Accept 
(or a complete explanation of EF Accept). In Section 5 we extract one test case 
from this subgraph. Section 6 improves the first algorithm for on-the-fly genera- 
tion by anticipating operations of the second algorithm. Section 7 then describes 
how these algorithms are integrated into our tool TGV. We conclude with a 
comparison with other works and perspectives. 



^ Inccmclusive verdicts are given to correct outputs of the HIT which do not lead to 
the satisfaction of the test purpose. 
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2 Conformance Testing 



In this section we introduce the models used for test case generation and how 
they are used to describe specifications, implementations, test cases and test 
purposes. These models are based on the classical model of labelled transition 
systems with distinguished inputs and outputs. We report to [25] for a precise 
definition of the testing theory used. 



Definition 1. An lOLTS is an LTS M = (Q^, with a finite set 

of states^ A^ a finite alphabet partitioned into three distinct sets A^ = AfUA^U 
where Af and Af^ are respectively inputs and outputs alphabets and is an 
alphabet of unobservable^ internal actions^ — C x A^ x is the transition 
relation and qf is the initial state. 



We will use the classical following notations of LTS for lOLTS. 






G P 



and CT G 

<?!••• 



■^M ^2 
M T 



- J 



Let g, G Q C ^ ^ 7 ' (i) 

q fi = {q = q^ S q q') and q 4>m = 3q^,q2 : q 

which generalizes in q q^ = 3go,---^n • q = qo 

Trace^{q) = {a \q and TraceM{M) = Trace^{q^). 

We note q afterM cr = {fi\q q^} and Q afterM (J = afterM cr. We 

define OutM{q) = {a G A^\q and OutM{Q) = {OutM{q)\q G Q}. We will 
not always distinguish between an lOLTS and its initial state and note M 
instead of qf^ We will omit the subscript m when it is clear from the context. 

A specification is given in a formal description language (e.g. SDL, LOTOS 
or Estelle) which semantics allows to describe the behavior of the specification 
by an lOLTS S = {Qfi Afi q^^) The lOLTS S and intermediate lOLTS 
defined from S are not effectively built but we need to define them for reasonning. 
As usually, we will assume that the behavior of the lUT can also be described 
by an lOLTS which can never refuse an input: lUT = 

with A^^^ = A\^^ U A^q^ U and Af C A\^^ and A^ C A^^^. We use a 
conformance relation which says that an lUT conforms to S if and only if after 
a trace of S\ outputs of the lUT are outputs of S\ 

lUT ioconf S Va G Trace{S)^Out{IUT afteriuT o') ^ Out{S afterg er) 
For the sake of clarity, we took the definition of ioconf but all results also apply 
to ioco [25] which considers quiescence (i.e. deadlock and output quiescence) as 
an observable event. In [17] we also treat livelocks. The relation ioco is obtained 
from ioconf by adding loops labelled with a particular output S to quiescent 
states of S and lUT. 

In practice, test purposes are used as test selection criteria. We formalize 
them by automata i.e lOLTS with selected marked states. A test purpose is 
a deterministic lOLTS TP = A'^^, ^tp, ^^^) equipped with two sets of 

sink states AccepP^ which defines Pass verdicts and RejecP^ which allows to 
limit the exploration of the graph S. We suppose that A^^ = A^ (this authorizes 



^ LOTOS does not distinguishes between inputs and outputs and a renaming is nec- 
essary for test generation. 
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actions of TP to be internal actions of S which is useful for testing in context) 
and TP is complete (Vg G ,a G ). 

The synchronous product of S and TP is an lOLTS SP = ^sp, 

with = Q^x (p, q) A^p (p^ ^ P and q A^p 

q^^ = (^o 7 ^A)- ^an be understood as an automaton with sets of sink states 
defined by AccepP^ = x AccepP^ and RejecP^ = x RejecP^ . 

As test generation only considers the observable behavior of S', a first step is 
to replace in SP all internal actions by r, to reduce r actions (while adding S 
loops for ioco) and to determinize the result. 

This defines an lOLTS S'Pyis = (Q""®, ^vis, C®) with Q^s ^ 

^vis ^ ^vis u ^vis ^vis ^ ^SP ^vis ^ qVis ^ qSP aftersp e, 

Va G A^^®,VP, P^ G Q^^^.P A^jg P^ p^ = p aftergp a. SPvis is equipped 

with sets of sink states defined by RejecP^^ = {s £ \ s Pi RejecP^ 7^ 0} 

and AccepP^^ = {s e \ s Pi AccepP^ 7^ 0} \ RejecP^^. 

Test cases are constructed from the lOLTS SP^jq. Before going to that 
construction, we must define what are test cases. A test case is an lOLTS 
TC = ^Tc, with distinguished subsets of states Pass^ Inconc^ 

and a new state fail. TC should have the following properties: 

1. C U {fail}, and qf^ = 

2. = AJ^ U Al^ with Al^ C and AJ^ C A^, (mirror image and all 

possible outputs of 1 considered), 

3. Pass = AccepP^^ P Inconc C Pass, Inconc and fail are sink 

states and every state of TC except fail can reach either a Pass or an 
Inconc state, fail and Inconc states can be reached directly only by inputs, 

4. Vg G Va G A'^^, (3q^ G Inconc U {fail}, q A^^ q^ ^ q Pass) and 
{q ^tc f Q'il q ^vis)? 

5. Vg, q^ G Va G A^^ g A^^ T ^ fail ^ q A^,g q^ , 

6. maximality: Vg G Inconc, q ^vis Accept, 

7. controllability: Wq G Q^^,Wa G A^"^,q A^c^ 

V6 ^ a, g A Vg G (3a eA^,q V6 eA^,q A^^). 

Some of these properties come from the definition of ioconf and ensure unbias 
(i.e. no correct implementation can be rejected by a test case). Some other prop- 
erties such as the controllability condition come from test practice: a tester al- 
ways controls its outputs. The maximality property ensures that no Inconclusif 
verdict can be given in a state where a Pass verdict could eventually be obtained 
later. In a state where an input is possible all possible outputs of an PUT are 
considered. This input completion allows us to consider transitions leading to 
fail as implicit in the sequel. These properties do not uniquely identify a test 
case, thus we have to provide a constructive algorithm which ensures them. This 
is the subject of the following sections. 
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3 Tar j art’s Algorithm as a Framework 

In this section we present an iterative version of the algorithm “StrongCon- 
nect” [24] computing the SCCs of a given digraph G = as a 

framework to derive other algorithms. First, we introduce all the notions and 
results used to describe this framework and following algorithms. Finally, we 
show the framework “StrongConnect” and its resulting graph. 




Fig. 1. Partition of edges defined by a DFS 



Recall that a graph is strongly connected if for each pair of states (v,w) there 
is a path from v to v containing w. The SCCs of a graph G are the maximal 
strongly connected subgraphs of G. A DFS applied to G defines a spanning for- 
est F by considering edges leading to unvisited states (tree- arcs). We suppose 
that states are numbered in the order in which they are reached during the 
search (field “number” of a state). Inspired from [24] a DFS defines a partition 
of edges (see Figure 1) : edges leading to a new state (not yet numbered) of 
the same (resp. distinct) SCC(s) are called “tree-arcs^n” (resp. “tree-arcSo^^t”); 
“fronds” are edges running from descendants to ancestors in the tree; “short- 
cuts^n” (resp. “short- cut So^^t”) are edges running from ancestors to descendants 
of the same (resp. distinct) SCC(s); edges in a SCC (resp. between two SCCs) 
running from one subtree to another subtree of F are called “cross-links (resp. 
“cross-linkso^it”). For any frond (resp. short-cut) between two states v and u 
(resp. u and v), there exists a path of tree- arcs from u leading to v. A root of a 
SCC is the first reached state of this SCC and thus the smallest numbered state. 
The field “lowlink” of a state allows to detect the root of each SCC synthesizing 
the smallest state which is in the same component and is reachable by traversing 
zero or more tree- arcs followed by at most one frond or cross-link^n- A state is 
a root of a SCC if and only if its number and its lowlink are equal. The follow- 
ing framework is the well-known algorithm “StrongConnect” of Tarjan, where 
some sections (Start state. New state. Old state. State of a new SCC, Tree- arc 
backtrack) are left empty for derived algorithms. This algorithm identifies the 
set of SCCs of a graph G. The stack “Dfs .Stack” stores the current exploration 
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sequence during the search and the stack “See .Stack” keeps all the visited states 
which see is still not completed. The field “act” of a state q is the label of the 
tree-arc leading to q. 

The function Adjacency _Set gives fireable transitions from a given state q, so: 

Adjacency _Set (q) := {(a, q’) | (q, a, q’) G ^g} 

Procedure StrongConnect (state : g'start); 

state . ^source 5 ^target 5 ^pred 5 adjaCeUCy _Set . Ady source 5 Ady target 5 Ad^ pred 5 

BEGIN 

Creation of Dfs. Stack; Creation of Scc.Stack; 

Qste.rt -number := -lowlink \= i \= i + 1; qst^rt-act := e; 

[Start state] 

Push {qstart, Adjacency -Set (qstart)) in DfsStack; Push g'start in SccStack; 
while not empty DfsStack do begin 

(^source 5 Adj source) • Top (DfsStack); 

if not empty Adjsource then begin 
Remove (m,gtarget) from Adj source; 

if ^target Is uot yct numbcrcd then begin (*(g'source, ^target) is a tree-arc*) 
qtar get -number := qt^r get -low link := i := i -\- 1; qt^rget-act := m; 

Push (^target , Adj acencySct(qtar get)) in DfsStack; Push ^target in SccStack; 

[New state] 

else begin (* (g'source, ^'target) is not a tree-arc *) 

if qtav get -number < qsource -number and ^target in SccStack then 
(* (^source,m, ^target) Is a fioud or a cross-linkin (in same SCC) *) 

qsource -lowlink := min(qsource-lowlink, qt^r get -number); 

[Old state] 
end 

else begin (* Adjsource is empty *) 

Pop (^source) from DfsStack; 

if qsource -lowlink = qsource -number then begin (*gsource is root of SCC*) 
while q := top(SccStack) satisfies q.number > qsource -number do 
begin (* creation of a new component *) 

Pop (q) from SccStack and put q in current component; 

[State of a new SCC] 
end 
end 

if not empty Dfs_Stack then begin (* backtracking *) 

(^pred, Adjpred) := top(D f sStack); 

qpred-lowlink := min(qpred-lowlink, qsource -lowlink); 

[Tree-arc backtrack] 
end 
end 
end 
END; 
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Procedure MAIN (G); 

BEGIN 

integer : i := 0; 

for ^start in if ^start Hot yet numbered then StrongConnect (g'start); 

END; 

We can interpret the result of this program as the reduced directed acyclic 
graph (DAG) where a node is a SCO of G and where edges are either tree-arCot^t, 
cross-linkont, or short-cut This algorithm is linear in both space and time. 
Notice that if the input graph G is rooted, a call to StrongConnect with the root 
will visit all states. 



4 Computation of the Complete Test Graph 

Notice that if the controllability condition defined in Section 2 is suppressed and 
in condition 3 we redefine Pass by Pass = Acceptvis, the resulting set of prop- 
erties uniquely defines a subgraph of *SPvis called Complete Test Graph (GTG) 
as it defines all potential test cases w.r.t. TP. Even if it does not define a test 
case, it is sometimes interesting to produce GTG and then to seperate it into 
a set of test cases. The following algorithm instantiating the undefined parts of 
StrongConnect framework computes the subgraph of the graph S'Pvis composed 
of all sequences leading to states in AccepP^^. Moreover, as ioco forces to take 
into account all outputs of the specification, those not leading to AccepP^^ have 
to be kept. Target states are put in the Inconclusive set. 

TraceiCTG) = {a | SP^^q after a C AccepP^^} 

U {a. a G 7Vace( S'Pvis) | ^ ^ ^ 3a^(S'Pvis after a.P C AccepP^^)} 

The problem of finding this subgraph reduces to the problem of finding the 
reduced DAG of SCCs which lead to an Accept state and thus reduces to find 
the roots of SCCs leading to an Accept state. For a DAG, a simple DFS allows 
to correctly synthesize the reachability to an Accept state along tree-arcSo^^t, 
cross-links otit and short-cutSo^it (there is no other type of edge in such a graph). 
This property is used to prove the correctness of the synthesis for each root of 
see using the underlying DAG structure of SCCs. Notice that a short-cut 
between u and v does not give additional information regarding reachability in 
u w.r.t. V because there exists another path of tree-arcs from u leading to v. 
A root of a SCC leads to an Accept state if and only if it is an Accept state 
or a state of its SCC can reach another SCC, by a tree-arc or cross-linko^^t, 
which leads to an Accept state. The field “L2A” of a state meaning “Leads to 
an Accept state” is used to synthesize this reachability information. Moreover, 
L2A is also used for garbage collection of unnecessary parts of the graph. 
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When we reach a state for the first time, 
its L2A field is initialized to true if and 
only if it is an Accept state. So: 

[Start state] 

q^tarfL2A := q^t^rt ^ Accept^^^; 
if ^start € Reject^^^ U Accept^^^ then 
remove all (^start, a, q) from ^vis; 

[New state] 

qtarget ’L2A . — ^target ^ AcCCpt , 

if ^target € Rcjcct^^^ U Acccpt^^^ then 
remove all (g'target, a, q) from ^vis; 
When a state is reached again, only cross- 
linko^it transitions add more information 
about reachability to an Accept state to 
the root of a strongly connected com- 
ponent. An input short-cut or cross- 
linko^it to a SCC not leading to Accept is 
pruned. So: 

[Old state] 

if ^target not iu Scc_Stack then begin 
(* It is a short-cut or cross-linko^it*) 
if qtar get -number < qsouvce -number 
then (* It is a cross-linko^it *) 



Qsouvce -^2 A . q source "-^2 A V ^^g-rget 

if -i^target -L2A A m E A^^® then 

remove (^source, m, ^target) from ^vis; 

end 

When a root of a SCC is found, its L2A 
field is correct. All the states of this SCC 
update their L2A field w.r.t. their root 
and the part of the graph which cannot 
lead to Accept is pruned. 

[State of a new SCC] 
q.L2A. . — ^source •T2 A., 
if ^q.L2A then 

remove all (g, a, q') from ^vis; 

We have to synthesize the reachability in- 
formation along tree-arcs. An input tree- 
arco^it leading to a state not in CTG is 
pruned. So: 

[Tree-arc backtrack] 

^pred-T2A. . — ^pred-T2A. V ^source •T2 A., 
if qsouvce -number = qsouvce -low link A 
'^source •T2A. A m . — qsouvce - net E A.J 
then 

remove (^pred, m’, g'source) from ^vis; 



Let CTG be the subgraph obtained by this algorithm from S'Pvis and reduced 
to the accessible part from its initial state. CTG = ^ctg, 

with two sets of marked states Fass^^^ and Inconc^^^ such that: 

ACTG _ ^CTG u Af^^ with A§^^ C A^^^^ and Af^^ = A^^^ (mirror image), 
^GTG = E ^vis I vT2A A {wT2A V a E 

QGTG = ^ QVIS I ^CTG Foss^^^ = AcccpC^^ f1 

Inconc^^^ = {ve Q^^^\Fass^^^ \ Va E A^^^ A w; E ^ ^ctg}- 

The graph CTG contains all the behaviors a test case might have. According 
to the test case definition, now we have to deal with controllability. 



5 Extraction of a Controllable Test Graph 

We present here an algorithm based on the StrongConnect framework and com- 
puting a subgraph of the accepted subgraph CTC, controllable and where each 
state can reach Fass state or is an Inconclusive state. Informally, the adaptation 
of StrongConnect consists in a DFS starting from each Fass state of CTG and 
using the predecessor transition relation. A correction of possibly controllability 
conflicts is done for all new reached state by pruning conflicting actions with the 
current action. This may modify the reachability from the initial state which is 
synthesized in order to determine the set of states of the resulting controllable 
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test case. This is based on the same scheme as the synthesis of L2A (field 
for “Reachable From the Initial State”). 



The function Adjacency _Set takes into 
account the backward sight of the new 
algorithm. 

Adjacency -Set {q) := 

{(<^5 ^ ^ ctg } 

The procedure Pruning allows to prune 
transitions of CTG causing controllability 
conflicts with the current transition of a 
given state. 

Procedure Pruning (q, m) 

Begin 

if m E then (* remove all others *) 

\f m' m, remove {q,m',q') from ^ctg; 
else (* remove all the outputs *) 

\/m' E remove (g, m\q^) from ^ctg; 

End 

The initialization and synthesis of the rfis 
field are based on the L2A field scheme. 
We prune conflicting transitions when a 
new state is reached. 



[Start state] 

Qstart f . — (^start — ^0 )? 

[New state] 

^target.r/is != (^target = 

Pruning (gtarget,m); 

[Old State] 

^source • rfis • — ^source- rfis V gtarget.r/is; 

if --gtarget -T fxs A TU £ Aq^^ then 

remove (g'target, m, ^source) from ^gtg 

[State of a new SCC] 

q.rfis := ^source-r/is; 
if ^q.rfis then 

remove all {q,a,q') from ^gtg; 

[Tree-arc backtrack] 

Qpred-rfis := qpred-rfis V q^onrce-rfis; 
if qsource -number = qsource -low link A 

^qsource -rfis A m' := qsource -act E Aq^^ 

then 

remove {qsource , q^red) from ^gtg; 



Let TC = ^tg, with two sets Pass'^'^ and Inconc^"^ such 

that : A^^ = AS^ U Af^ with AJ^ = Ag^^ and Af^ = Af^^, 

-^Tc = {{v, a,, vj)^ctg\v - rfis A {vj.rfis V a, e Af^)}, , 

QTo = {ve I qr ^To v}, Fass'^^ = Fass^'^^ n 
Inconc^'^ = {w € Q'^°\r’ass'^° | Va € ^4'^° A w € (v, a, w) ^ ^to}- 

Remark: The order in which we apply “controllable” StrongConnect to Pass 
states of CTG influences the resulting test case. Breadth-first search starting 
from the set of Pass states could give shortest test cases, but StrongConnect 
allows to interleave garbage collection. The graph CTG represents all the behav- 
ior to be tested w.r.t. a test purpose. We can derive from this graph sequential, 
arborescent, or looping test cases. We could even apply some automata based 
methods such as UIO to each SCC of CTG. 



6 Solving Controllability Conflicts during Forward Search 

In the previous section, we have shown an algorithm resolving all the control- 
lability conflicts of the graph CTG. In fact some conflicts can be solved during 
the forward DFS as will be shown in this section. 

Notice that in the complete algorithm, we attempt to synthesize in an efficient 
way the information of reachability to an Accept state on all the roots of SCC of 
NPvis- First, we prone that a controllability conflict in a state can be removed 
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during this algorithm in the case where we know that this state leads to an 
Accept state. This is done while backtracking a transition m between a source 
state such that its L2A is still false and a target state such that its L2A is true. 
In this case, we can prune all the transitions from the source state in conflict 
with m. This assumption leads us to synthesize L2A not only for a root of a 
see but also for all the states of this See which can get the information earlier 
and then to prune parts of the initial graph earlier and thus saving time. In this 
algorithm refined from the CTG computation, the synthesis of L2A information 
is done along tree- arcs, cross-links and fronds and possible pruning actions are 
done earlier. As seen before, short-cut transitions give redundant information. 



The function “pruning” not only prune 
conflicting already synthesized transitions 
but also conflicting transitions not already 
treated. 

Procedure Pruning (q,m, Adj) 

Begin 

if m G AY^^ then 

begin (*remove all other transitions*) 

Adj := 0; 

\/m' 7 ^ m, remove al\{q,mYq') from ^vis; 
else begin (* remove all the inputs *) 

\/m' G remove all (g, from 

\/m' G remove all (m^, q') from Adj] 



short-cuts, fronds and tree-arcs. Now, 
pruning actions are done earlier when 
backtracking to a state which knows that 
it leads to Accept. 

[Old State] 

if ^target -TS A. f\ i^source -TS A. then 

Pruning (^source, m, Adj 

source ) 5 

else if ^target 0 Scc-Stack A -ig'target-T2A 
f\ m E A^^® then 

remove (^source, m, ^target) from ^vis; 

^source -7/2 A . — ^source •T2A V ^target -7/2 A, 

; [Tree-arc backtrack] 

if gsource-T2A A -igpred.T2A then 



end 

End 

[Start state] , [New state] and [State 
of a new SCC] are identical to parts of 
the complete test graph computation. The 
synthesis of L 2 A is done along cross-links, 



Pruning (^pred 5 ^source • act , Acjpred ) 1 
else if ^source ^ SccStCLCk A i^source-T2A. 
A m! := ^source-act G A^^® then 

remove (^pred, ^source) from ^vis; 
^pred-T2A. . — ^pj-ecj.T2A. V ^source -T2 A., 



Remark: The resulting graph is a subgraph of CTG. Some controllability con- 
flicts persist in some states which have synthesized the L2A information w.r.t. 
their root only. We have to apply to this resulting graph the algorithm of the 
previous section to correct persistent conflicts. 



7 Tool 

Architecture and Algorithms: The algorithms presented in the paper are the 
basis of our prototype tool TGV developed in collaboration with IRISA/INRIA 
Rennes and Verimag Grenoble [12,13,17,4]. In Section 2 we have presented all 
lOLTS considered for the test cases generation: S', TF^ SP^ SPvis- As TGV 
works on-the-fly, only necessary parts of these lOLTS are constructed on de- 
mand in a lazy way. This imposes that lOLTS are implicit and accessed through 
APIs giving the functions for their construction: the initial state, the transition 
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relation and comparison of states. TGV also uses an API of CADP [11] to store 
parts of all intermediate lOLTS. TGV is interfaced with several different sim- 
ulators which provide the API for S. In particular it is interfaced with the SDL 
simulator ObjectGeode from Verilog [3] and the LOTOS simulator from CADP. 

Another central algorithm is the algorithm which computes *SPvis from SP 
already presented in [17]. It combines several aspects: addition of S actions in the 
case of quiescence, r-reduction and determinization. All this is done on-the-fly 
with again an adaptation of StrongConnect interleaved with a classical subset 
construction for determinization (see e.g. [15]). In this case StrongConnect is 
applied to the subgraph of SP composed of r-actions. Meanwhile, observable 
actions and target states are synthesized on top of the subgraphs and a subset 
construction is applied for determinization. StrongConnect starts from the initial 
state of SP and creates new initial states for subsequent calls to StrongConnect 
each time an observable action reaches a new state until no new initial state is 
created. Links from states to their SCCs are stored avoiding to explore an already 
computed SCC. The application of Tarjan algorithm has linear complexity in 
time and space but the subset construction involves an exponential blow up in 
the size of the resulting graph. Nevertheless as TGV is applied on-the-fly we 
have been able to tackle examples with a lot of internal actions (specifications 
describing services for example) where determinization was the bottleneck for 
methods with complete state graph generation. 

Case Studies: The first experiment of TGV [13] was done on an SDL speci- 
fication of an ISDN protocol named DREX. Even with this embryonic version 
implementing the algorithm of [12] which did not completely work on the fly, 
TGV proved its efficiency and the quality of generated test cases compared to 
manual ones. TGV now works on-the-fly with the algorithms presented here and 
has been experimented on two industrial size case studies. The first one is an 
SDL specification of the SSCOP protocol of the ATM which served for many 
other case studies. This study allowed us to combine static analysis techniques 
in prelude to test generation and to use TGV on a multi-process specification in 
an asynchronous environment [4]. The second one is a LOTOS specification of a 
cache coherency protocol of a multiprocessor architecture of Bull [18]. Produced 
test cases have been executed on the real architecture and improved the test 
practice in a domain to which it was not originally dedicated. Eor these two case 
studies, on-the-fly generation proved its utility as it was impossible to generate 
the complete state graphs. 

Comparison with other Techniques: Compared to TGV, methods based on 
automata theory (see e.g. [20]) have serious drawbacks. They need the construc- 
tion of complete state graphs which limits their use to small specifications. As 
in TGV, they need abstraction and reduction of internal actions, determiniza- 
tion and often minimization, but on the whole state graph of the specification. 
They need the construction of identifying sequences (UIO for example) and quite 
complex algorithms to build test cases while TGV is linear in this phase. Their 
advantage is the complete coverage of a fault model but, as other methods, this 
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needs assumptions on the implementation such as fairness. Often determinism 
is required but this is not realistic. A test suite is a monolithic sequence and 
does not correspond to hand- written test cases. Observable non- determinism 
(possible responses of the implementation to an input) is not really taken into 
account because output divergence of the specification w.r.t. the test sequence 
directly leads to an Inconclusive verdict . Nevertheless, test suites written by 
hand sometimes have identifying sequences that automatic tools should be able 
to produce. These sequences do not identify states of the state graph but control 
states of the specification. An idea is to use these methods on more abstract spec- 
ifications (only the control part) in order to generate test purposes for control 
state identification. 

This leads us to the comparison with TVEDA. TGV and TVEDA are com- 
plementary tools. TVEDA can produce automatically test purposes that can be 
used by TGV. But generating test purposes automatically is in general not suf- 
ficient to cover all interesting behaviors. So users will still have to specify some 
test purposes by hand. 

In some aspects, TGV seems similar to Samstag [14] which uses test pur- 
poses specified by MSCs. But there are important differences. The first one is 
that the theory underlying Samstag is not clear. Nothing refers to any confor- 
mance relation or fault model which prevent from any proof on the correctness 
of generated test cases. Non-determinism is not taken into account because if 
a test purpose MSC does not define a ’’unique observable”, it is rejected. A 
test purpose specified by an MSC must describe a complete sequence of observ- 
able events which makes it difficult to write and prevents for any abstraction. 
Einally, the algorithm is almost limited to checking that the MSC describes a 
(non-deterministic) behavior of the specification and completing the MSC with 
inputs leading to Inconclusive. 

Trojka [9] has common points with TGV. It is based on the same theoretical 
background [25]. It performs on-the-fiy test case generation in the sense that it 
can simultaneously execute them on the lUT. Trojka does not use test purposes 
as TGV but randomly chooses outputs of the test case among possible ones and 
checks the validity of inputs according to the observable behavior of the specifi- 
cation. This necessitates a function similar to r-reduction and determinization . 
This has been implemented by a breadth traversal which prevents the detection 
of livelocks and may duplicate some work, problems which are solved by TGV 
with the computation of SCCs. 

8 Conclusion 

We have presented a new on-the-fiy test case generation algorithm based on a 
classical graph algorithm also used in some local and on-the-fiy mo del- checkers. 
This algorithm has complexity linear in the size of the observable behavior of 
the product of a test purpose and a specification. It produces test cases of high 
quality and very similar to those written by hand with choices and loops. This 
algorithm and those sketched in Section 7 are being transfered into the Object- 
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Geode tool from Verilog [3]. They will serve as the test generation engine which 
will accept test purposes either written by hand or obtained by simulation or 
automatically computed by a method derived from TVeda with a coverage strat- 
egy [19]. However, what is lacking in TGV is a clever treatment of data. For the 
moment, the stress has been put on control and data values are enumerated by 
the underlying simulation tools which may lead to a state explosion for specifi- 
cations with large value domains. But we have started to study the possibility 
to combine the algorithms of TGV with a constraint solver and abstract in- 
terpretation techniques with the ambition to generate symbolic test cases with 
parameters and variables. Some ideas from previous works on TVEDA for ex- 
ample [22] could also be helpful. 
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Abstract. The theory of latency insensitive design is presented as the foundation of 
a new correct by construction methodology to design very large digital systems by as- 
sembling blocks of Intellectual Properties. Latency insensitive designs are synchronous 
distributed systems and are realized by assembling functional modules exchanging data 
on communication channels according to an appropriate protocol. The goal of the pro- 
tocol is to guarantee that latency insensitive designs composed of functionally correct 
modules, behave correctly independently of the wire delays. A latency-insensitive pro- 
tocol is presented that makes use of relay stations buffering signals propagating along 
long wires. To guarantee correct behavior of the overall system, modules must satisfy 
weak conditions. The weakness of the conditions makes our method widely applicable. 



1 Introduction 

The level of integration available today with Deep Sub-Micron (DSM) technologies (0.1pm 
and below) is so high that entire systems can now be implemented on a single chip. Designs 
of this kind expose problems that were barely visible at previous levels of integration: the 
dominance of wire delays on the chip and the strong effects created by the clock skew [2]. It 
is predicted that a signal will need more than five (and up to more than ten!) clock ticks to 
traverse the entire chip area. Thus it will be very important to limit the distance traveled by 
critical signals to guarantee the performance of the design. However, precise data on wire- 
lengths are available late in the design process and several costly re-designs may be needed 
to change the placement or the speed of the components of the design to satisfy performance 
and functionality constraints. We believe that, for deep sub-micron designs where millions of 
gates are customary, a design method that guarantees by construction that certain properties 
are satisfied is the only hope to achieve correct designs in short time. In particular, we focus 
on methods that allow a designer to compose pre-designed and verified components so that 
the composition formally satisfies certain properties. 

In this paper, we present a theory for the design of digital systems that maintains the 
inherent simplicity of synchronous designs and yet does not suffer of the ”long-wire” problem. 
According to our methodology, the system can be thought as completely synchronous, i.e. just 
as a collection of modules that communicate by means of channels having a latency of one 
clock cycle. Unfortunately, the final layout may require more than one clock cycle to transmit 
the appropriate signals. Our methodology does not require costly re-design cycles or to slow 
down the clock. The key idea is borrowed from pipelining: partition the long wires into 
segments whose lengths satisfy the timing requirements imposed by the clock by inserting 
logic blocks called relay stations^ which have a function similar to the one of latches on a 
pipelined data-path. The timing requirements imposed by the real clock are now met by 
construction. However, the latency of a channel connecting two modules is generally equal to 
more than one real clock cycle. If the functionality of the design is based on the sequencing of 
the output signals and not on their exact timing, then this modification of the design does not 
change functionality provided that the components of the design are latency insensitive^ i.e., 
the behavior of each module does not depend on the latency of the communication channels. 
We have essentially traded off latency for throughput by not slowing down the clock and by 
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inserting relay stations. In this paper, we introduce these concepts formally and prove the 
properties outlined above. Classical works on trace theory [3,7] and delay insensitive circuits 
could be used to address our problem, but these approaches imply that the delay between 
two events on a communication channel is completely arbitrary, while in our case we obtain 
stronger results by assuming that this arbitrary delay is a multiple of the clock period. 

The paper is organized as follows: in Section 2 we give the foundation of latency insensitive 
design by presenting the notion of patient processes. In Section 3 we discuss how in a system 
of patient processes communication channels can be segmented by introducing relay stations. 
Section 4 illustrates the overall design methodology and discusses under which assumption a 
generic system can be transformed in a patient one. 



2 Latency Insensitivity 



To cast our methodology in a formal framework, we use the approach of Lee and Sangiovanni- 
Vincentelli to represent signals and processes [5]. 



2.1 The Tagged-Signal Model 

Given a set of values V and a set of tags T, an event is a member of V x T. Two events 
are synchronous if they have the same tag. A signal s is a set of events. Two signals are 
synchronous if each event in one signal is synchronous with an event in the other signal and 
vice versa. Synchronous signals must have the same set of tags. 

The set of all V-tuples of signals is denoted . A process T is a subset of . A particular 
V-tuple s G satisfies the process if s G T. A V-tuple s that satisfies a process is called 
a behavior of the process. Thus a process is a set of possible behaviors ^ . A composition of 
processes (also called a system) {Ti, . . . , Pm}^ is a process defined as the intersection of their 
behaviors P — nm=i Since processes can be defined over different signal sets, to form 
the composition we need to extend the set of signals over which each process is defined to 
contain all the signals of all processes. Note that the extension changes the behavior of the 
processes only formally. 

Let J = (jh, • • • ^jh) be an ordered set of integers in the range [1, V], the projection of 
a behavior h = (si, . . . , s^) G onto is projjfb) = {sj ^, ... , 5^^). The projection of a 
process P C onto is projj(P) = (s^ I 3s G T Aprojj{s) = s^). A connection G is a 
particularly simple process where two (or more) of the signals in the A^-tuple are constrained 
to be identical: for instance, G(b J, k) C : (^i , . . . , s^v) G k) AA si = sj = with 

i, j, k G [1, Vj. 

In a synchronous system every signal in the system is synchronous with every other signal. 
In a timed system the set T of tags, also called timestamps, is a totally ordered set. The 
ordering among the timestamps of a signal s induces a natural order on the set of events of s. 



2.2 Informative Events and Stalling Events 

A latency insensitive system is a synchronous timed system whose set of values V is equal 
to P U {r}, where P is the set of informative symbols which are exchanged among modules 
and r ^ V is a special symbol, representing the absence of an informative symbol. From now 
on, all signals are assumed to be synchronous. The set of timestamps is assumed to be in 
one-to-one correspondence with the set IN of natural numbers. An event is called informative 
if it has an informative symbol ti as value An event whose value is a r symbol is said a 
stalling event (or r event). 

^ For N > 2, processes may also be viewed as a relation between the N signals in s. 

^ We use subscripts to distinguish among the different informative symbols of A : ci, C2A3) • • • 
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Definition 1. S{s) denotes the set of events of signal s while o.nd Et{s) are respectively 

the set of informative events and the set of stalling events of s. The k-th event of a 

signal s is denoted ek{s). T{s) denotes the set of timestamps in signal s^ while %{s) is the 
set of timestamps corresponding to informative events. 

Processes exchange “useful” data by sending and receiving informative events. Ideally only 
informative events should be communicated among processes. However, in a latency insensi- 
tive system, a process may not have data to output at a given timestamp thus requiring the 
output of a stalling event at that timestamp. 

Definition 2. The set of all sequences of elements in U U {r} is denoted by Uiat- The length 
of a sequence a is \a\ if it is finite^ otherwise is infinity. The empty sequence is denoted as e 
and, by definition, |e| = 0. The i-th term, of a sequence a is denoted ai. 



Definition 3. Function a : S x -n Fiat takes a signal s = {(uq, to)? (^i ? ^i)? ct'^d an 
ordered pair of timestamps {ti,tj), i < j, and returns a sequence £ Fiat ^T. = 

j • • • j • The sequence of values of a signal is denoted cr[s). The infinite subsequence of 
values corresponding to an infinite sequence of events, starting from. T is denoted 

For example, considering signal s = {(ri,ti), (^2,^2)? (^2)^4), (^^1,^5), (r, to)} we have ^ 

cr(s) = ii i2 r i2 F r , CF[t2,t4](-^) = i2 r i2, and respectively, |a(s)| = 6, 

\^t2,t4{^)\ = 3, |c^t5,t5(5)| = 1. To manipulate sequences of values we define the following 
filtering operators. 

Definition 4. J-\ \ Fiat returns a sequence F = d-fia] s.t. 

a'. = [ ^ 

^ 1 e otherwise 



Definition 5. J-f : F\at returns a sequence a' = J-ffi] s.t. 

^ \ € otherwise 



For instance, if cj(s) = i-2 r i-2 r , then J-\[a{s)\ = ii i-2 ^2 F J-f[a{s)\ = r r. 

Obviously, |cj(s)| = |Tf[c7(s)]| + |TV[ct'( 5)] |. Latency insensitive systems are assumed to have a 
finite horizon over which informative events appear, i.e., for each signal s there is a greatest 
timestamp T G %{s) which corresponds to the ”last” informative event. However, to build 
our theory we need to extend the set of signals of a latency insensitive system over an infinite 
horizon by adding a set of timestamps such that all events with timestamp greater than T 
have r values. 

Definition 6. A signal s is strict if and only if (\A) all informative events precede all stalling 
events, i.e., iff there exists a fc G IN s.t. ITV (5)] | = 0 and (s)] | = 0 . A signal 

which is not strict is said, to be delayed ( or stalled ). 



2.3 Latency Equivalence 

Two signals are latency equivalent if they present the same sequence of informative events, 
i.e., they are identical except for different delays between two successive informative events. 
Formally: 

Definition 7. 'Two signals si, S2 are latency equivalent si S2 iff d-(\a{si)\ = d-\[a{s2)]- 
^ In this paper we assume that for all timestamps ti,tj G T(s), ti <tj <=> i < j. 
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The reference signal of class of latency equivalent signals is a strict signal obtained by 
assigning the sequence of informative values that characterizes the equivalence class to the 
first |T1 [c7(si)]| timestamps. For instance, signals si and S 2 presenting the following sequences 
of values 



o-(5i) = i-i i-2 r i-i ^2 i-3 r i-i i-2 r r r ... 
cr(s2) = i-i i-2 r r i-i r i2 ^-3 ^ T r i-2 r . . . 

are latency equivalent. Their reference signal is characterized by the sequence of values 

a(Sref) i-2 T ^2 ^3 T i -2 T T T . . . 

Latency equivalent signals contain the same sequences of informative values, but with 
different timestamps. Hence, it is useful to identify their informative events with respect to 
the common reference signal: the ordinal of an informative event coincides with its position 
in the reference signal. 

Definition 8. The ordinal of an informative event = {vk,tk) F is defined as 

ordfck) = I — 1. Let si and qi he two latency equivalent signals: two informative 

events ek{si) G and ei{qi) G L^{qi) are said corresponding events iff ord{e}^{si)) = 

ord{ei{qi)) . The slack between two corresponding events is defined as slack{e]^{si)^ei{qi)) = 
\k-l\. 

We extend the notion of latency equivalence to behaviors, in a component-wise manner: 

Definition 9. Two behaviors (si, . . . , sw) Cind . . . , are equivalent iff Vi s[). 

A behavior h = (s^, . . . , s^v) is strict iff every signal s^ ^ h is strict. Every class of latency 
equivalent behaviors contains only one strict behavior: this is called the reference behavior. 



Definition 10. d wo processes Fi and F 2 owe latency equivalent^ F\ =^- F 2 , if every behavior 
of one is latency equivalent to some behavior of the other. A process F is strict iff every 
behavior b ^ F is strict. Every class of latency equivalent processes contains only one strict 
process: the reference process. 



Definition 11. M signal si is latency dominated by S2; s\ S2 iff s\ S2 and d\ < d'2? 
with d'k = max {t | t G E^{sk)}^ k = 1,2. 

Hence, referring to the previous example, signal si is dominated by signal S 2 since 21=9 
while T 2 = 12. Notice that a reference signal is latency dominated by every signal belonging to 
its equivalence class. Latency dominance is extended to behaviors and processes as in the case 
of latency equivalence. A total order among events of a behavior is necessary to develop our 
theory. In particular, we introduce an ordering among events that is motivated by causality: 
events that have smaller ordinal are ordered before the ones with larger ordinal (think of a 
strict process where the ordinal is related to the timestamp; the order implies that past events 
do not depend on future events). In addition, to avoid cyclic behaviors created by processing 
events with the same ordinal, we assume that there is an order among signals. This order in 
real-life designs corresponds to input-output dependencies. We cast this consideration in the 
most general form possible to extend maximally the applicability of our method. 

Definition 12. Given a behavior b = (si, . . . , sw)? <c denotes a well-founded order on its set 
of signals. The well-founded order induces a lexicographic order <io over the set ofinformative 
events ofb, s.t. for all pairs of events (61,62) with 61 G St^{si) and 62 G Lt{sj) 

ei <io 62 GG [ (ord{ei) < ord{e2)) V ( ( ord{ei) = ord{e2) ) A (s^ <c Sj) ) ] 

The following function returns the first informative event (in signal sj of behavior 6) 
following an event 6 G 6 with respect to the lexicographic order <io. 
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Definition 13. Given a behavior h = {si,... and an informative event e[si) G E^{si)j 

the function nextPvent is defined as: nextEvent(sj, e[si)) = miiig^('5^^ ^ <io ek{sj)} 



A stall move postpones an informative event of a signal of a given behavior by one time- 
stamp. The stall move is used to account for long delays along wires and to add delays where 
needed to guarantee functional correctness of the design. 

Definition 14. Given a behavior 6 = (s^, . . . , . . . , Cind an informative event e]^{sj) = 

(vkGk), ci stall move returns a behavior tJ = stall {ek{sj)) = (si, . . . , • • • ,sn), s.t. for all 

^eiN; cf[to,tk-x]{s'j) = ^[tk,tk]E ^[tk+i+i,tk+i+i]i^'j) = ^[tk+i,tk+i]i^j)- 

A procrastination e ect represents the “effect” of a stall move stall{ek{s j)) on other signals 
of behavior b in correspondence of events following ek{sj) in the lexicographic order. The 
processes will “respond” to the insertion of stalls in some of their signals “delaying” other 
signals that are causally related to the stalled signals. 

Definition 15. A procrastination effect is a point-to-set map which takes a behavior U = 
= stall{ek{s j)) resulting from, the application of a stall move on event ek(sj) 
of behavior b = (si,... ,sj\i) and returns a set of behaviors V S [stall {ek{sj))] s.t. b" — 
{sf,... ,s%)eVS[b^] iff 

— Vi G [1, A^],i ^ j, sf =r s[ and , where ti is the timestamp of 

event efisfi) = nextEvent{si^ek{sj)); 

- 3K finite s.t. Vi G [l,A^],i ^ J, < K, = a[ti,oo]{s'i)^ 

Each behavior in VS[b^] is obtained from b^ by possibly inserting other stalling events in 
any signal of b\ but only at ”later” timestamps, i.e. to postpone informative event which 
follow ek(sj) with respect to the lexicographic order <io- Observe that a procrastination 
effect returns a behavior that latency dominates the original behavior. 



2.4 Patient Processes 

We are now ready to define the notion of patient process: a patient process can take stall 
moves on any signal of its behaviors by reacting with the appropriate procrastination effects. 
Patience is the key condition for the IP blocks to be combinable according to our method. The 
following theorems ^ guarantee that, for patient processes, the notion of latency equivalence 
of processes is compositional. 

Definition 16. A process P is patient iff 

V6 = (si, . . . , sa/) G P, Vj G [ 1 , A], 'dek(sj) G S^{sj), {VS [stall {ek{sj))] n P ^ 0 ) 

Hence, the result of a stall move on one of the events of a patient process may not satisfy the 
process, but one of the behaviors of the procrastination effect corresponding to the stall move 
does satisfy the process, i.e., if we stall a signal on an input to a functional block, the block 
will be forced to delay some of its outputs or if we request an output signal to be delayed 
then an appropriate delay has to be added to the inputs. 

Lemma 1. Let Pi and P 2 be two patient processes. Let hi G Pi^ 62 G P2 be two behaviors 
with the same lexicographic order s.t. bi 62* Then, there exists a behavior V G (Pi nP2); 
bi =j- b =j- 62 . 

Theorem 1. If Pi and P 2 owe patient processes then (Pi Gl P2) is a patient process. 



The proofs of the lemmas and the theorems presented in this paper can be found in [1]. 
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Theorem 2. For oM patient processes F 2 , F[, F^^ if F\ F[ and F 2 =r ^2 
(CinCa) (Ci'nC') 

Therefore, we can replace any process in a system of patient processes by a latency equi- 
valent process, and the resulting system will be latency equivalent. A similar theorem holds 
for replacing strict processes with patient processes. 

Theorem 3. For all strict processes F\^F 2 and patient processes F^^F^^ if F\ F[ and 
n C' then (Cl n C2) {P[ n C') 

This means that we can replace all processes in a system of strict processes by correspon- 
ding patient processes, and the resulting system will be latency equivalent. This is the core of 
our idea: take a design based on the assumption that computation in one functional block and 
communication among blocks “take no time” (synchronous hypothesis) i.e., the processes 
corresponding to the functional blocks and their composition are strict, and replace it with 
a design where communication does take time (more than one clock cycle) and, as a result, 
signals are delayed, but without changing the sequence of informative events observed at the 
system level, i.e., with a set of patient processes. 



3 Latency Insensitive Design 

As explained in Section 1, one of the goal of the latency insensitive design methodology 
is to be able to “pipeline” a communication channel by inserting an arbitrary amount of 
memory elements. In the framework of our theory, this operation corresponds to adding some 
particular processes, called relay stations^ to the given system. In this section, we first show 
how patient systems (i.e. systems of patient processes) are insensitive to the insertion of relay 
stations and, then, we discuss under which assumption a generic system can be transformed 
into a patient system. 



3.1 Channels and Buffers 

A channel is a connection ® constraining two signals to be identical. 

Definition 17. A channel C{i,j) c G [1,A^] is a process s.t. h = (si,...,sa/) G 

C{iJ) GG s, = Sj. 

Lemma 2. A channel C(i,j) G is not a patient process. 



Definition 18. A buffer 4 (b j) '^Fh capacity c > 0, minimum forward latency If >0 and 
minimum backward latency is a process s.t. G [1^ N]: h = (si, ...,sa/) G ) 

iff (s^ Sj) and V/c G IN 

0 < I X iW ] I - IX {Sj) ] I (1) 

C > I X (sO ] I - I X, {sj) ] I (2) 

By definition, given a pair of indexes q J ^ [1, tV], for all /^, c > 0, all buffers B^^ 4 (^ 7 ) 
are latency equivalent. Observe also that buffer i^QQ(bj) coincides with channel C{i,j). In 
particular, we are interested in buffers having unitary latencies and we want to establish under 
which conditions such buffers are patient processes. 

Theorem 4. Let F = If = I . For all c>l, Bi is patient iff Si <c Sj. 



In other words, communication and computation are completed in one clock cycle. 
See section 2.1 for the definition of connection. 
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Fig. 1. Comparing two possible behaviors of finite buffers and 

Consider a strict system Pgtrict = {^m=i with N strict signals . . . , As explained 

in section 2.1, processes can be defined over different signal sets and to compose them we 
may need to formally extend the set of signals of each process to contain all the signals of 
all processes. However, without loss of generality, consider the particular case of composing 
M processes which are already defined on the same N signals. Hence, any generic behavior 
,5^^) of is also a behavior of Pstrict ^ foi* a-H ^ ^ [1; A/], / ^ m process 
Pi contains a behavior hi = (si^,... , sij^) s.t. Vrr G [1,A] {si^ = In fact, we may 

assume to derive system Pstrict by connecting the M processes with (M — 1) • N channel 

processes C(ln, (/ + l)n); where / G [1, {M —1)] and n G [1, A]. Further, we may also assume to 
“decompose” any channel process C{mn^ L^) with an arbitrary number X of channel processes 
C{ran^ x\)^ C{xi , 0 : 2 ), . . . , C{xx-i^ In), by adding A — 1 auxiliary signals, each of them forced 
to be equal to = /„ . The theory developed in section 2 guarantees that if we replace each 
process Pm G Pstrict with a latency equivalent patient process and each channel C(i^j) with 
a patient buffer i(i,j) we obtain a system Ppatient which is patient and latency equivalent 
to Pstrict- In fact, having a patient bu erin a patient system is equivalent to having a 
channel in a strict systemP. Since “decomposing” a channel C(ip) has no observable effect 
on a strict system, we are therefore free to add an arbitrary number of patient buffers into 
the corresponding patient system to replace this channel. Since we use patient buffers with 
unitary latencies, we can distribute them along that long wire on the chip which implements 
(7(b j), in such a way that the wire gets decomposed in segments whose physical lengths can 
be spanned in a single physical clock cycle. 



3.2 Relay Stations 

The following Lemma 3 proves that no behaviors in i(i,j) may contain two informative 
events of sj which are synchronous: this implies that the maximum achievable throughput 
across such a buffer is 0.5, which may be considered suboptimal. Instead, buffer Bi i (/, j) is the 
minimum capacity buffer which is able to “transfer” one informative unit per timestamp, thus 
allowing, in the best case, to communicate with maximum throughput equal to 1. Figure 1 
compares two possible behaviors of these buffers. 

Lemma 3. Bf ^{i^j) is the minimum eapaeity bu erwith lf=lh=l s.t. 

, . . . , G (b j) A G IN, (efc(sj) G A efc(s^) G 

Definition 19. The bu er is called a relay station RS. 



4 Latency Insensitive Design Methodology 

In this section, we move towards the implementation of the theory introduced in the previous 
sections. To do so, we assume that: 

— the pre-designed functional blocks are synchronous processes; 

— there is a set of signals for each process that can be considered as inputs to the process 
and a set of signals that can be considered as outputs of the process, i.e., the processes 
are functional] 
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— the processes are strictly causal (a process is strictly causal if two outputs can only be 
different at timestamps that strictly follow the timestamps when the inputs producing 
these outputs show a difference ^). 

— the processes belong to a particular class of processes called stallable, a weak condition 
to ask the processes to obey. 

The basic ideas are as follows. Composing a set of pre-designed synchronous functional blocks 
in the most efficient way is fairly straightforward if we assume that the synchronous hypothesis 
holds. This composition corresponds to a composition of strict processes since there is a priori 
no need of inserting stalling events. However, as we have argued in the introduction, it is very 
likely that the synchronous hypothesis will not be valid for communication. If indeed the 
processes to be composed are patient, then adding an appropriate number of relay stations 
yields a process that is latency equivalent to the strict composition. Hence, if we use as the 
definition of correct behavior the fact that the sequences of informative events do not change, 
the addition of the relay stations solves the problem. However, requiring processes to be 
patient at the onset is quite strong. Still, in practice, a patient system can be derived from 
a strict one as follows: first, we take each strict process Fm and we compose it with a set 
of auxiliary processes to obtain an equivalent patient process F^. To be able to do so, all 
processes Fm must satisfy a simple condition (the processes must be stallable) specified in 
the next section. Then, we put together all patient processes by connecting them with relay 
stations. The set of auxiliary processes implements a “queuing mechanism” across the signal 
oiPm in such a way that informative events are buffered and reordered before being passed 
to Fm- informative events having the same ordinal are passed to Fm synchronously. 

In the sequel, we first introduce the formal definition of functional processes. Then, we 
present the simple notion of stallable processes and we prove that every stallable process 
can be encapsulated into a wrapper process which acts as an interface towards a latency 
insensitive protocol. 

4.1 Stallable Processes 

An input to a process F C is an externally imposed constraint Fj C such that FjCiF is 
the total set of acceptable behaviors. Commonly, one considers processes having input signals 
and output signals: in this case, given process P, the set of signals can be partitioned into 
three disjoint subsets by partitioning the index set as = I UOUP, where I is the 

ordered set of indexes for the input signals of P, O is the ordered set of indexes for the output 
signals and R is the ordered set of indexes for the remaining signals (also called irrelevant 
signals with respect to P) . A process is functional with respect to (/, O) if for all behaviors 
h G F and 6^ G P, projj{b) = proji(F) implies projo{b) = projo{b^)> 

In the sequel, we consider only strictly causal processes and for each of them we assume 
that the well founded order of definition 12 subsumes the causality relations among its 
signals, i.e. formally: Vi G /,Vj G O, (s^ <c 5^). 

Definition 20. A process F vjith I = , Q} and O = {Q + 1, . . . , A^} is stallable iff for 

all h = (si, . . . , sq, sq +1 , . . . , sw) G P and for a// fc G IN ; 

G[t.,t.](•s^) = ^) Vj e O =^) 

Hence, while a patient process tolerates arbitrary distributions of stalling events among its 
signals (as long as causality is preserved), a stallable process demands more regular patterns: 
T symbols can only be inserted synchronously (i.e., with the same timestamp) on all input 
signals and this insertion implies the synchronous insertion of r symbols on all output signals 
at the following timestamp. To assume that a functional process is stallable is quite reasonable 
with respect to a practical implementation. In fact, most hardware systems can be stalled: 
for instance, consider any sequential logic block that has a gated clock or a Moore finite state 
machine M with an extra input, that, if equal to r, forces M to stay in the current state and 
to emit T at the next cycle. 

^ For a more formal definition see [5]. 
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Fig. 2. Example of a behavior of an equalizer E with I = {1,2,3} and O — {4,5,6}. 



4.2 Encapsulation of Stallable Processes 

Now, our goal is to define a group of functional processes that can be composed with a 
stallable process F to derive a patient process which is latency equivalent to P. We start 
considering a process that aligns all the informative events across a set of channels. 

Definition 21. An equalizer E is a process, with I = {!,... , Q} and O = {Q + 1, ... , 2 • Q\, 
s.t. for all behaviors h = (s^, . . . , sq, sq^i, . . . , S 2 .q) E E: 'di I, {si SQ+i) o.nd V/r E IN 

v«, J e O ( = r) ^ = 0 ) 

min { I n I } - max { | V, p[t„,t,](sq] | } >0 

jEO 



The first relation forces the output signals to have stalling events only synchronously, while 
the second guarantees that at every timestamp the number of informative events occurred at 
any input is always greater than the number of informative events occurred at any output. 
In particular, the presence of a stalling event at any input at a given timestamp forces the 
presence of a stalling event on all outputs at the same timestamp. Figure 2 illustrates a 
possible behavior of an equalizer. 



Definition 22. An extended relay station STZS is a process with I = {i} and O = {j, /}? 
i ^ j ^ I s.t. signals Sq,S 2 owe related by inequalities (1) and (2) of definition 18 (with 
= I and c = 2) and Vfc E IN; 






IX rito,t^]iW] 

0 otherwise 



Vi- [c"[to,tfc-i](v)l I = 2 



Definition 23. A stalling signal generator is a process with I = {!,... , Q} o/nd O = 

{Q + 1} s.t. V6 = (si, . . . , SQ+i), Vfc E IN, Vi E [1, Q], {E, ^ [0? IJ) 



^[tk,tk]EQ+-l) 



T if 3j e [1, Q] ( J-l = 1 ) 

0 otherwise 



As illustrated in Figure 3, any stallable process F can be composed with an equalizer, a 
stalling signal generator and some extended relay stations to derive a patient process which 
is latency equivalent to F. 

Definition 24. Let F be a stallable process with Ip — {p{, . . . } o/nd Op = {q [, . . . , ^^}. 

A wrapper process (or, shell process} W (F) of F is the process with Iw = {Pi, • • • ;Pm} 

Ow — {^ij • • • ,<1n} which is obtained composing P with the following processes: 

— an equalizer E with Ip — {pi, • • • ,Pm,Pm+i} and Op = {p}, • • • ^PmEm+i}> 

— N extended relo/y stations SlZSi, S1ZS2 , . . . ,LlZSp s.t. Ij = {qt} o/nd Oj = {qj, rj}, with 

je[l,W] 

— a stalling signal generator with Iq = {ri, . . . , r^y} o.nd Oq = {pm+i}* 



Theorem 5. Let W (F) be the wrapper process of def. 2f. Process W = proji^\j o„{W{F)) 
is a patient process that is loiency equivalent to F. 
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Fig. 3. Encapsulation of a stallable process P into a wrapper W(P), 

In conclusion, our latency insensitive design methodology can be summarized as follows: 

1. Begin with a system of M stallable processes and N channels. 

2. Encapsulate each stallable process to yield a wrapper process. 

3. Using relay stations decompose each channel in segments whose physical length can be 
spanned in a single physical clock cycle. 

This approach clearly ”orthogonalizes” computation and communication: in fact, we can build 
systems by putting together hardware cores (which can be arbitrarily complex as far as they 
satisfy the stalling assumption) and wrappers (which interface them with the channels, by 
“speaking” the latency insensitive protocol). While the specific functionality of the system is 
distributed in the cores, the wrappers can be automatically generated around them Finally, 
the validation of the system can now be efficiently decomposed based on assume- guarantee 
reasoning [4,6]: each wrapper is verified assuming a given protocol, and the protocol is verified 
separately. 



5 Conclusions and Future Work 

A new design methodology for large digital systems implemented in DSM technology has been 
presented. The methodology is based on the assumption that the design is built by assemb- 
ling blocks of Intellectual Properties (IPs) that have been designed and verified previously. 
The main goal is to develop a theory for the composition of the IP blocks that ensures the 
correctness of the overall design. The focus is on timing properties since DSM designs suffer 
(and will continue to suffer even more for the foreseeable future) from delays on long wires 
that often cause costly redesigns. Designs carried out with our methodology are called la- 
tency insensitive design. Latency insensitive designs are synchronous distributed systems and 
are realized by assembling functional modules exchanging data on communication channels 
according to a latency-insensitive protocol. The protocol guarantees that latency insensitive 
designs composed of functionally correct modules, behave correctly independently of the wire 
delays. This allow us to pipeline long wires by inserting special memory elements called relay 
stations. The protocol works on the assumption that the functional blocks satisfy certain weak 
properties. The method trades-off latency for throughput, hence it is important to optimize 
the amount of latency that we must allow to obtain correct designs. This optimization leads 
to the concept of speculative latency insensitive protocols which will be the subject of a future 
paper. 
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Abstract. We consider symbolic verification for a class of parameterized 
systems, where a system consists of a linear array of processes, and where 
an action of a process may in general be guarded by both local conditions 
restricting the state of the process about to perform the action, and 
global conditions defining the context in which the action is enabled. 
Such actions are present, e.g., in idealized versions of mutual exclusion 
protocols, such as the bakery and ticket algorithms by Lamport, Burn’s 
protocol, Dijkstra’s algorithm, and Szymanski’s algorithm. The presence 
of both local and global conditions makes the parameterized versions of 
these protocols infeasible to analyze fully automatically, using existing 
model checking methods for parameterized systems. In all these methods 
the actions are guarded only by local conditions involving the states of 
a finite set of processes. 

We perform verification using a standard symbolic reachability algorithm 
enhanced by an operation to accelerate the search of the state space. 
The acceleration operation computes the effect of an arbitrary number 
of applications of an action, rather than a single application. This is 
crucial for convergence of the analysis e.g. when applying the algorithm 
to the above protocols. 

We illustrate the use of our method through an application to Szym- 
anski’s algorithm. 



1 Introduction 

Much attention has recently been paid to extending the applicability of mo- 
del checking to infinite-state systems. One reason why a program may have an 
infinite state space is that it operates on unbounded data structures. Exam- 
ples of such systems include timed automata [ACD90], data- independent sy- 
stems [Wol86], relational automata [Cer94], pushdown processes [BS95], and 
lossy channel systems [AJ96]. Another reason is that the program has an infinite 
control part. This is the case e.g. in Petri nets [Esp95,Jan90], and parameterized 
systems, in which the topology of the system is parameterized by the number of 
processes inside the system. In verification of parameterized systems, we are often 
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interested in proving the correctness of the system regardless of the number of 
processes. Verification algorithms for systems consisting of an unbounded num- 
ber of similar or identical finite-state processes include [GS92,AJ98,KMM+97], 
and (using a manually supplied induction hypothesis) [CGJ95,KM89,WL89]. 

In this paper we consider algorithmic verification of a class of parameterized 
systems, intended to capture at least the behaviours of several mutual-exclusion 
algorithms that can be found in the literature. Examples of mutual exclusion 
algorithms that work for an arbitrary number of processes are: the bakery and 
ticket algorithms by Lamport, Burn’s protocol, Dijkstra’s algorithm, and Szy- 
manski’s algorithm. These algorithms are implemented on systems with an ar- 
bitrary number of processes with linearly ordered identities. The ordering of the 
processes may reflect the actual physical ordering (e.g. Szymanski’s algorithm), 
or the values assigned to local variables inside processes (e.g. the ticket given to 
each process during the execution of Lamport’s bakery protocol). A configura- 
tion of the system can be described as a string representing the local states of 
the processes. A common feature which places these protocols outside the scope 
of existing model checking methods, is that an action of a process is in general 
guarded by both local and global conditions on the processes. Local conditions 
restrict the state of the process which is about to perform the action. Global 
conditions define the context in which the action is allowed to occur. A context is 
typically stated as a formula which is quantified over the set of processes inside 
the system. Examples of contexts are “all processes with lower identities should 
have local states belonging to given set” , or “there should be at least one process 
with a higher identity which has a local state included in a given set”, etc. We 
propose a model which combines both types of conditions. An action involves 
the change of local state of a process, and may be conditioned on both the local 
state, and the context in which the action is performed. 

To verify our protocols we perform a standard symbolic forward reachabi- 
lity analysis, using regular expressions to represent (possibly infinite) sets of 
configurations. It is well-known that checking most safety properties (including 
satisfiability of mutual exclusion) can be reduced to checking the reachability 
of a set of “bad” configurations (in our case specified as a regular expression). 
However, the presence of both local and global guards implies that the standard 
reachability algorithm will not terminate when applied to any of the earlier men- 
tioned protocols. A main contribution of this paper is that we define an operation 
to accelerate the search through the state space. The acceleration operator com- 
putes the effect of an arbitrary number of applications of an action, rather than 
the effect of only a single application. This is crucial for obtaining termination 
during the analysis of any of the above protocols. Notice that the algorithm is 
incomplete and may in general still fail to terminate. 

Related Work There are several results on verification of parameterized sy- 
stems [GS92,AJ98,CGJ95,KMM+97]. In all these works the actions are guarded 
only by local conditions involving the states of a finite set of processes. A work, 
which is close in spirit to ours is [KMM+97]. The authors propose to use regular 
sets of strings to represent states of parameterized arrays of processes, and to 
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represent the effect of performing an action by a predicate transformer (transdu- 
cer). However, the work in [KMM+97] considers only transducers that represent 
the effect of a single application of a transition. This means that their approach 
will not terminate if applied to reachability analysis for the protocols we consider 
in this paper. In contrast, we introduce a acceleration operator for actions with 
both local and global contexts, meaning that reachability analysis will terminate. 
Applications of acceleration operations are reported in the context of communi- 
cating finite state automata [BG96,BGWW97,BH97,ABJ98]. The acceleration 
operation is applied to transitions of different types than in our work, namely 
those that iterate a single loop in the control part of a program, rather than 
repetitive applications of a transition to different processes in the system. There 
has also been a number of case studies in verification of mutual exclusion proto- 
cols such as Burn’s protocol [JL98] and Szymanski’s algorithm [GZ98,MAB+94, 
MP90]. The verification in each case is dependent on abstraction functions or 
lemmas explicitly provided by the user. 

Outline In the next section, we define the class of system models that we 
consider and illustrate it by an idealized version of Szymanski’s mutual exclusion 
algorithm. In Sect. 3 we define composition and acceleration of actions. In Sect. 4 
we show how they can be used in verification of safety properties, illustrated by 
a verification of Szymanski’s algorithm. Section 5 contains conclusions and some 
non-resolved problems. 



2 Preliminaries 



In this section, we will introduce a generic system model which is intended to 
capture the behaviour of idealized versions of many existing mutual exclusion 
protocols, e.g. Dijkstra’s mutual exclusion problem, Lamport’s bakery algorithm. 
Burn’s protocol, Szymanski’s algorithm. In our model, a program consists of an 
arbitrary number of identical processes, ordered in a linear array. The process 
behaviours are defined through a finite set of actions. An action represents a 
change of local state of a process. An action may be conditioned on both the 
local state of the process, and the context in which it may take place. The 
context represents a global condition on the local states of the rest of processes 
inside the system. The ordering of the processes may reflect the actual physical 
ordering (e.g. Szymanski’s algorithm), or the values assigned to local variables 
inside processes (e.g. the ticket given to each process during the execution of 
Lamport’s bakery protocol). 

An idealized version of Szymanski’s mutual exclusion algorithm can be given 
as follows. In the algorithm, an arbitrary number of processes compete for a 
critical section. The local state of each process i consists of a control state ranging 
over the integers from 1 to 7 and of two boolean flags, Wi and Si. A process is 
in the critical section when the control state is 7. A pseudo-code version of the 
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actions of any process i could look as follows: 

1 : await '• j ^ i '• ~^Sj 

2 : Wi^ Si := true , true 

3 : if 3j : j : {pcj / 1) A 

then Si := false; goto 4 
else Wi := false; goto 5 

4 : await Sj A -^Wj then Wi^Si := false ^ true 

5 : await Vj : j i : ~^Wj 

6 : await : j < i : -^Sj 

7 : Si := false ^ goto 1 

For instance, according to the code at line 6, if the control state of a process 
i is 6, and the value of s is false in all processes to the left, i.e. for all processes 
j < then the control state of i may be changed to 7. In a similar manner, 
according to the code at line 4, if the control state of a process i is 4, and if 
the context is that there is at least another process j (either to the right or to 
the left of i) where the value of Sj is true and the value of Wj is false ^ then the 
control state, Wi and Si in i may be changed to 5, false ^ and true^ respectively. 
In fact in almost all the protocols that we have considered, contexts are defined 
by existentially or universally quantified formulas restricting the local states of 
processes to the left or to the right. In our model we work with a particular 
subclass of regular languages, which can capture such contexts. 

A left context is a regular language which can be accepted by a deterministic 
finite-state automaton with a unique accepting state, and where all outgoing 
transitions from the accepting state are self-loops, (transitions with identical 
source and target states). A right context is a language such that the language of 
reversed strings is a left context. The tail of a left context is the set of symbols 
that label self-loops from the accepting state. The tail of a right context is the 
tail of the left context which is its reverse language. 

Examples of left contexts are regular expressions of the form 






where each is of form (ai + • • • + Um)*, where each fi is of form {bi bk) 

such that bj does not occur in the expression e^, for any j = 1, . . . , A:. 

Now, we give the formal definition of our model. We use a finite set C of 
colours to model the local states of processes. A program is a triple V = (C, (j>i^A) 
where 

C is a finite set of colours, 

(j)j is a regular expression denoting a set of initial configurations over C, and 
^ is a finite set of actions. An action is a triple of the form 

<t)L ; r(c,c') ; (t>R 

where fip is a left context, fip is a right context, and r(c, c^) is a an idem- 
potent binary relation on C. 




138 



P.A. Abdulla et al. 



A configuration 7 of P is a string 7[1] 7 [2] • • • j[n] over C, where j[i] denotes 
the local state of process i. For a regular expression we use 7 G 0 to denote 
that 7 is a string in the language denoted by (j). For ifi : 1 < i < j < n, we use 
j[i .. j] to denote the substring j[i] 7[i + 1 ] • • • j[j]. An action 

a = 4 >l ] t{c, c') ; (f)R 

defines a relation a on configurations such that o;(7,7^) holds if 7 and Y are of 
equal length n, and there is an i with 1 < i < n such that ^(7[^], 7^[^]) holds, 
7[1 .. i — 1] = 7^[1 .. i — 1 ] G (j>L: and 7[i + 1 .. n] = 7^[i A 1 .. n] G (j>R. Thus, an 
action corresponds to a (possibly nondeterministic) program statement in which 
the colour at one position i can be changed from some colour c to some colour c\ 
provided that r(c, c^) holds and that the string to the left of i is in (j)L and that 
the string to the right of i is in (j)R. We write 71 — ^ 72 to denote that o;(7i, 72) 
for some action o; G Al. We use o;* and to denote the transitive closures of 
a and — ^ respectively. A configuration 7 is said to be reachable if there is a 
configuration 7/ G such that 7/ 7. 

The reachability problem is defined as follows. 

Instance A program V and a set of configurations of V represented by a 
regular expression fip. 

Question Is any 7 G reachable? 

In Fig. 1 we represent Szymanski’s algorithm as a program in our framework. 
To simplify the notation, we introduce the following syntactical notations. We 
let a colour be a triple (pc, tc, s), where pc G {!,... , 7 }, and w and s are boolean. 
We use predicates to define colours. For example, the predicate (-is) denotes the 
set of colours where the value of s is equal to false ^ that is the set {(pc, wYo^lse) : 
pc G ( 1 , ... , 7 } and w G {trwe,/a/5c}}. We use the predicate true to denote the 
set of all colours. We use guarded commands to represent binary relations on 
colours. For instance, the command (pc = 1 ) — ^ pc := 2 represents the relation 
{((pci, tc, s), (pc2, tc, s)) : (pci = 1 ) and (pc2 = 2 )}. Notice that e.g. at line 

3 the left context ((pc = 1 ) V w)*[{pc 1 ) A ->w)true* is equivalent to 

trwe*((pc 7 ^ 1 ) A -itc)trwe*; however, we use the previous expression in order to 
be consistent with the definition of a left context. 



3 Acceleration of Actions 



In this section we define an operation which computes the effect of an unbounded 
number of executions of an action. 

For an action o;, let a* be the action constructed by repeating the action a an 
arbitrary number of times. More precisely, a* denotes the set of pairs (7,7^) of 
configurations such that there exists a sequence 707172 * * ^>f configurations 

with n > 0 such that 7 = 70, Y = and such that o;(7^,7^^i) for i = 
0 , 1 , . . . ,n — 1 . Similarly, we let a~^ be the action constructed by repeating the 
action a one or more times. 
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Fig. 1 . Actions for Modelling Szymanski’s Algorithm 



We shall now characterize for any action a, A characterization of a* can 
be obtained from a char act erizt ion of by taking the union with the identity 

relation. 

Theorem 1 . Let a he an action of the form 

(f>L "r(c, c') ; (pR 

where r(c, c^) is idempotent^ and where Ul (^r) the tail of 4>l (fu)- Then 
a~^ consists of the set of pairs (7,7^) of configurations of equal length (say n)^ 
such that there are indices j with I < i < j < n such that 

1 . 7 [l.i - 1] = y[l.i - 1] e (pL. 

2 . j[j + l-.n-] = 7'b' + l-.n] € (pR, 

3 . r(7[i], 7^[i])^ each k with i < k < j we have ^f[k] = ^f[k] 

or r(7[A:], 7^[A:]). 

4. For each k with i < k < j we have ^f[k] G Sr or ^f[k] G Sr. 

5 . For each k with i < k < j we have ^f[k] G S^ or ^f\k] G S^. 

6. For all indices ki^ k2 with i < ki < k2 < j we have j[ki] G Sl V j[k2] G Sr 

and Y[ki] e Sl V ^[^2] ^ Sr. □ 

In the symbolic reachability analysis (described in Sect. 4 ), we use regular 
expressions as representations of sets of configurations. The characterization of 
Theorem 1 can be used to model the effect of (repetitive applications of) actions 
on regular sets by using finite- state transducers. This approach is proposed in 
[KMM+ 97 ], where however acceleration is not considered. 

We recall that an action a denotes a set of pairs (7,7^) of configurations. 
Equivalently, we can represent the action as a set of finite strings over C x C, 
namely as the strings (ci, c() (c2, F2) ' ' ' such that 

{ciC2 • • ' c(c2 ' ' ' ^ easy to see that each action can be represented 

by a finite-state transducer. 

More importantly, for any action a the characterization of Theorem 1 can 
be used to find a representation of o;+ in a straight-forward way, since o;+ can 
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be represented as a regular language over C x C. As an example, in Fig. 2 we 
show the transducer which accepts o;+ where a is the action at line 3 in Fig.l. 
We note that the transducer in Fig. 2 need not use the full generality of the 
characterization of Theorem 1, since the alphabets Ul and Sr both are equal 
to the set of all colours. 



—iguard copy copy V change 




Fig. 2. Transducer for a~^ from line 3 of Fig. 1. 



In the figure we use -^guard to denote pairs {((pc,tc, s), {pc^w^ s)) : pc ^ 
1 A ->w} , we use guard to denote pairs {((pc,tc, s), {pc^w^ s)) : pc = 1 V tc}, we 
use copy to denote pairs ((pc, tc, s), {pc^w^ s)) of identical tuples, and change to 
denote pairs {((pc, tc, s), {pc' ^ w' ^ s')) : pc = 3 A pc' = 4: A w' = w A s' = false} 
that represent a change of local control state. 

For a regular expression f and an action o;, we use a*{(j)) to denote the 
regular expression we get by computing (in the usual way) the product of f and 
the transducer corresponding to a*. 

In order to illustrate that the conditions in Theorem 1 characterize a regular 
relation between configurations, we show a representation of this relation in 
terms of a finite-state transducer. We show the part which is inserted between 
the accepting state qp of an automaton that copies strings in fp and the initial 
state qu of an automaton that copies strings in (j)R. In Fig. 3, we show the general 
construction. Edges are labeled by predicates on pairs (c, c^) of colours that are 
read. We use the abbreviations cp for c G Sp^ c^ for c' G Sp^ cr for c G Sr^ and 
c^ for c' G Sr. In addition to the transitions in the figure, there are self- loop 
at states gi, g2? ^3, £^nd q 4 labeled c = c^ G Sp Pi Sr. Informally, the states 
correspond to the following situations. 

— qi corresponds to a state when the transducer has read an index where some 
change has occurred, but where so far there has been no index with change 
at which c ^ Sp V c' ^ Sp . 

— q 2 corresponds to a state when the transducer has read an index where some 
change has occurred where c ^ Sp^ but where so far there has been no index 
with change at which c' ^ Sp. 

— qs corresponds to a state when the transducer has read an index where some 
change has occurred where c' ^ Sp , but where so far there has been no index 
with change at which c ^ Sp. 
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^(c, c') 




Fig. 3. General transducer for a~^ 



— Q4 corresponds to a state when the transducer has read an index where some 
change has occurred where ^ some index where some change has 

occurred where c ^ Ul. 



4 Verification 

In this section we show how the operation of acceleration, presented in the pre- 
ceding section, can be used to enhance a standard version of symbolic forward 
reachability analysis, whose purpose is to compute a representation of the set 
of reachable configurations. The analysis algorithm maintains a set of reach- 
able configurations, which is initially the set of initial configurations. In each 
step of the algorithm, the set of reachable configurations is extended with the 
configurations that can be reached by some action from a configuration in the 
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current set. We use regular expressions to represent (potentially infinite) sets of 
configurations. As we shall illustrate later in the section, this algorithm will not 
terminate when applied to any of the protocols mentioned in the introduction. 
To solve this problem, we use the operation a* (defined in Sect. 3) to accelerate 
the exploration of the state space. We recall that a* computes the set of succes- 
sors corresponding to an arbitrary number of applications of an action (rather 
than a single application). 

Suppose we are given a program V = (C, ^/, A) and a regular expression 0 ^, 
and that we want to check whether some configuration jj,' G (j>F is reachable in 7 ^. 
For the current discussion, let us represent the set of configurations maintained 
by the algorithm by a set V of regular expressions. The set V represents the 
union of the sets denoted by all regular expressions in V . Initially, V = The 

algorithm will now for each regular expression (j) mV and each action a compute 
q;*( 0) represented as a finite union of regular expressions. When a new expression 
(j) is generated, it is compared with those which are already in F. If 0 C for 
some (j)^ E then (j) is discarded, since it will not add new configurations to the 
explored state space (it is actually sufficient that (j) C 4>' for (f> to be safely 

discarded). In fact, we can also discard all G F with (j)^ C (f>. It is also checked 
whether (j) has a non-empty intersection with (j)],'. If the intersection is non-empty, 
the algorithm terminates, reporting that some configuration m (pp is reachable. 
Otherwise, the algorithm terminates when no new regular expressions can be 
generated. Obviously, our algorithm is incomplete in the sense that while it will 
always find reachable configurations in it will not necessarily terminate if all 
configurations in (pp are unreachable. 

We illustrate this algorithm through an application to Szymanski’s protocol. 
To simplify the notation we use the coding of colours shown in Table 1, so e.g. 
C2 corresponds to the colour ( 2 ,/a/ 5 e,/a/ 5 e). The set of initial configurations is 
represented by (po = c*. 

First, we observe that the above standard reachability algorithm will run into 
an infinite loop as follows. By applying action 1 to (po we get C2 c|. Applying 
action 1 again gives c* C 2 c* C 2 c*, etc. 

Although the standard algorithm fails, using the acceleration operation leads 
to termination. In Table 2 we describe a simulation of our algorithm. We start 
from the set of initial configurations (pQ. For each regular expression (pi and action 
q ;, we compute a* {(pi) or o;+( 0 ^) and add the resulting regular expressions to 
the set of existing expressions. For instance, from only action 1 is enabled, 
resulting in the configurations denoted by (pi. Whenever an expression is entailed 
by another one (e.g. (p^ C ^ 0)7 we indicate that in the table. In such a case, 
the constraint (in this case (pr) is discarded and not explored further. At (p2^ we 
pursue both and o;^, denoted U o;^, in one step. The algorithm terminates 
in 19 steps. 
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Cl 


(1, false ^ false) 


C2 


(2, false ^ false) 


C3 


(3, true^ true) 


C4 


(4, trwe, false) 


C5 


false ^ true) 


C6 


{6^ false ^ true) 


C7 


{7^ false ^ true) 



Table 1. Coding of colours in analysis of Szymanski’s algorithm 



5 Conclusions 

In the paper, we have presented techniques for reachability analysis of para- 
meterized systems where a configuration of the system can be described by a 
string representing the local states of the processes. We have found that naive 
symbolic reachability analysis does not converge for such systems, and propose 
to use acceleration of actions to obtain termination. We showed that using ac- 
celeration, symbolic reachability analysis terminates for an idealized version of 
Szymanski’s algorithm. We have also analyzed corresponding versions of other 
mutual exclusion algorithms, including Burn’s and Dijkstra’s mutual exclusion 
algorithms, and the bakery and ticket algorithms by Lamport. For some of these 
algorithms, we use variants of the acceleration operation presented in this paper: 
we perform the acceleration on the action obtained by sequentially composing 
two actions, and we also define an acceleration operation on actions that involve 
two adjacent processes which can be guarded by left and right contexts. 

We further note that we have considered idealized versions of the mutual 
exclusion algorithms. In most implementations of these algorithms, a global gu- 
ard (such as e.g., Vj : j < i : ->Sj) is not atomic: in a more refined description of 
the algorithm this is a loop which checks the states of other processes. We have 
not considered how to treat the non-atomic versions of statements such as this 
one. 
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Abstract. We address the problem of verifying systems operating on 
different types of variables ranging over infinite domains. We consider in 
particular systems modeled by means of extended automata communi- 
cating through unbounded fifo channels. We develop a general methodol- 
ogy for analyzing such systems based on combining automatic generation 
of abstract models (not necessarily finite-state) with symbolic reachabil- 
ity analysis. Reachability analysis procedures allow to verify automat- 
ically properties at the abstract level as well as to generate auxiliary 
invariants and accurate abstraction functions that can be used at the 
concrete level. We propose a realization of this approach in a framework 
which extends PVS with automatic invariant checking strategies, auto- 
matic procedures for generating abstract models, as well as automata- 
based decision procedures and reachability analysis procedures for fifo 
channels systems. 



1 Introduction 

Communication protocols can be naturally modeled by automata communicating 
through fifo queues. However, in the modeling of such systems, we often need, 
besides queues, additional variables and data structures such as counters (for 
instance to memorize the number of messages sent) and timers (to check timeouts 
and introduce synchrony between processes). Hence, to reason about protocols, 
we need in general to analyze extended automata operating on variables which 
may range over several different domains. Moreover, the relevant domains may 
in general be infinite (e.g., in the case of counters, unbounded fifo channels, 
etc). We are here interested in automatic analysis of such heterogeneous infinite- 
state models, especially of extended automata with fifo channels. We develop a 
general analysis methodology for these models based on combining abstraction 
and symbolic reachability analysis. We show a realization of this methodology 

This work was partially funded by the National Science Foundation under grant 

CCR-9509931 to SRf international and has been done while S. Bensalem and P. 

Habermehl were visiting SRf international. 

N. Halbwachs and D. Peled (Eds.): CAV’99, LNCS 1633, pp. 146-159, 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 




Verification of Infinite-State Systems 147 



in a framework which extends the tool InVeSt [6] by automata-based decision 
procedures and reachability analysis techniques in order to deal with unbounded 
fifo channels. 

Several approaches can be adopted to analyze infinite-state systems. One of 
them is abstraction. It consists in finding an abstraction function allowing to con- 
struct a faithful abstract model which can be automatically analyzed [13,12,23]. 
The problem is then how to find a suitable abstraction function and how to 
derive automatically the abstract model from the concrete one. Several frame- 
works that provide assistance for performing these tasks [18,15,19,5] have been 
proposed. However, finding abstraction functions remains in general a non triv- 
ial problem which requires a deep understanding of the behavior of the system. 
In some extreme cases, knowledge of the set of reachable configurations of the 
system is needed. 

Another approach is symbolic reachability analysis. It consists in computing 
a finite representation of the set of all reachable configurations of the system. 
If such a computation can be done, this approach allows to solve verification 
problems that are reducible to reachability problems (e.g., verification of safety 
properties). Moreover, the generation of the set of reachable configurations allows 
to get for free, and fully automatically, finite abstractions of the system. These 
abstractions are usually finite transition graphs (called symbolic graphs) where 
nodes represent sets of configurations. This approach has been applied for several 
kind of extended automata: timed automata and hybrid automata [4], counter 
automata [14,8], pushdown automata [9,17], fifo-channel automata [3,7,10,2], 
etc. However, the existing symbolic techniques concern in general homogeneous 
models, i.e., models with one kind of variables ranging over unbounded data 
domains. So, in order to apply these techniques to communication protocols, we 
need in general an abstraction step allowing to get homogeneous models from 
heterogeneous ones. 

From the descriptions above of the two approaches, it appears that they are 
complementary. In this work, we propose to combine these two approaches: 

1. Using abstraction and automatic techniques for generating abstract models 
in order to obtain homogeneous models from heterogeneous ones. Notice that 
we need here methods allowing to generate automatically abstract extended 
automata that are not necessarily finite-state. 

2. Using symbolic reachability analysis techniques in order to verify automati- 
cally properties at the abstract level, as well as to generate auxiliary invari- 
ants and abstraction functions that can be used at the concrete level. 

Let A be an extended automaton with variables of two different types T± and 
7 2 - To verify a property on M, the ideal situation is that we are able to provide an 
abstraction function p on the whole state space of M, to construct a correspond- 
ing finite-state abstract model Ap which simulates M, i.e., that contains for each 
behavior of M a corresponding behavior, and to check that the property holds 
on Ap. As we mentioned above, it is in general hard to find such a p. However, 
it is often the case that we have a way to define an abstraction function on the 
variables of type T\ (for instance by taking systematically a partition of their 
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space of values according to the predicates appearing in the model), whereas the 
set of all reachable values of variables of type 7 2 must still be analyzed precisely. 

In such a case, we can start in a first step by applying an abstraction pi on the 
variables of type Ti and obtain a model Ap^ where all variables are of type 7 2 . 
Then, in a second step, we apply to Ap-^ a symbolic reachability analysis proce- 
dure which is specific to extended automata of type 7 2 . This procedure computes 
a representation of Reach(Ap^) which is the set of all reachable configurations 
in Ap^. Then, the result of this computation can be used in different ways: 

— Generation of invariants and verification of invariance properties: To show 
that every reachable configuration of A satisfies a property (/?, we can show 
that Reach{Ap^) C pi{(p) at the abstract level. Moreover, p]^^ {Reach{Ap^)) 
is an invariant of the concrete model A which can help in establishing in- 
variance properties at the concrete level. 

— Construction of finite abstractions of Ap ^ : we can consider the partition 
of Reach{Ap^) according to control states, or any finer partition tt, and 
construct the corresponding symbolic graph {Ap^/7r). This graph can be 
used to check properties in the universal fragments of temporal logics (in 
particular, linear-time properties). 

— Generation of abstraction functions: Verifying a safety property on A can be 
reduced to checking a reachability property on a system AxO where O is 
an observer (finite automaton expressing the property). Then, each time we 
consider a new observer O, we need to define an abstraction function on M x 
O. The knowledge of the possible contents of the variables of A (considered 
separately) can help in constructing such an abstraction function. 

Clearly, in order to apply this methodology, we should have for each type of 
variables we consider a symbolic representation allowing to represent and manip- 
ulate infinite sets of configurations: These structures must allow to perform some 
basic operations such as boolean operations and the computation of successors 
and predecessors. We also need decision procedures on these representations for 
solving emptiness, membership, and entailment problems. These procedures are 
needed during the symbolic reachability analysis, as well as during the construc- 
tion of the abstract models for given abstraction functions. Furthermore, since 
we must reason at the concrete level on heterogeneous models and combine the 
results of analysis of several kinds of variables, we need a general and uniform 
framework where all the representation structures we consider can be embedded. 
In this paper, we consider a framework based on InVeSt-PVS, and we show how 
this framework is extended in order to support the methodology we describe 
above for extended automata with fifo queues. 

The original framework we consider offers the logical language of PVS [26] 
which is based on higher order logic, where extended automata can be specified. 
Also, various decision procedures are available in this framework, in particular 
for linear arithmetics. The tool InVeSt [6] provides strategies for checking in- 
variants, as well as an automatic procedure for constructing in a compositional 
manner abstract models for given abstraction functions [5]. During the construc- 
tion of abstract models, the existing implementation of this procedure in InVeSt 
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invokes PVS (and its decision procedures) in order to decide the existence of 
the transitions between abstract states. This procedure is reasonably efficient as 
long as the considered data structures in the models can be described in theories 
for which PVS provides decision procedures. For instance, this procedure can be 
applied efficiently in the case of integer counters with linear operations. However, 
when we consider sequence variables like fifo channels, the use of the PVS-based 
implementation of this procedure becomes ad-hoc and cumbersome. This is due 
to the lack in PVS of decision procedures on regular languages. Indeed, sets of 
contents of sequence variables can be naturally represented by means of finite- 
state automata or regular expressions, and many of the proof goals concerning 
these variables could be solved as emptiness or entailment problems of regular 
languages. 

A contribution of this work is to propose an embedding of regular languages in 
InVeSt-PVS. This embedding consists of a theory of regular expressions allowing 
to express in the PVS language constraints like x e L where x is of type sequence 
and L is a regular language, a connection to the tool Amore [24] (which provides 
a library of procedures on finite-state automata), and an automatic procedure 
for computing abstract models using automat a-based decision procedures. 

Another contribution of this work is an extension of InVeSt by an automatic 
reachability analysis procedure for (lossy) fifo-channel systems. This extension is 
made through a connection to the tool Lcs [1] which allows to compute the set 
of reachable configurations of a lossy fifo-channel system by means of a regular 
expressions-based symbolic representation, following the procedure introduced 
in [2]. The tool Lcs allows also to generate automatically finite abstract models 
as symbolic graphs. 

We illustrate the use of our framework on the examples of the Alternating 
Bit Protocol and the Bounded Retransmission Protocol. 

2 Extended Fifo-Channel Automata 

We consider in this paper untimed models of protocols that are parallel compo- 
sitions of extended automata communicating through unbounded lossy queues. 

An extended fifo-channel automaton A consists of a set of control locations 
Q, a vector of typed variables an initial control location qinit G Q, a set Vinit 
of initial vectors of values, and a set of transitions S between control locations. 
Each transition is labelled by a guarded command. The guard is a predicate on 
the variables and defines an enabling condition of the transition. The command 
is a transformation (assignment) of the variables. 

Among the variables, we distinguish fifo channels. We suppose that messages 
in these channels are in a finite alphabet T, and we consider the following oper- 
ations: the emptyness test empty {c) (true if the channel c is empty), c\a {send a 
to channel c), and c?a {receive a from c provided a is the first symbol in c), for 
any a e F. We do not fix the types of the other variables. They may range over 
finite or infinite domains (boolean, integers, etc). 

A configuration of the system is a tuple (q^v) where g G Q is a control 
location, v is a valuation of the variables. Notice that the valuation of each 
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channel is a word on the alphabet F. Let be the set of configurations of A. 
The set of the initml configurations of A is = {qinit} x Knit- 

The semantics of the model is defined by means of a transition relation C 
X Sji between configurations. We assume that the channels are lossy in the 
sense that they can lose messages at any time. Hence, in the definition of the 
execution of a transition in S can be preceded and followed by losses of messages 
(contents of channels may decrease according to the subsequence ordering) . The 
formal definition of Rj^ is standard and is omitted here. Then, we associate with 
A the transition system 

We consider the two usual functions postj^^prej, : 2 ^-^ ^ 2 ^-^ such that, 
for any set of configurations X C postj,{X) (resp. prej,{X)) is the set of 
immediate successors (resp. predecessors) of X in Sj,. We denote by post and 
pre the dual functions of post and pre^ i.e., (j) = for (j) E {post^pre}. We 

denote by post* and pre* the reflexive-transitive closures of post and pre. The 
set of all reachable configurations of A is defined by Reach(A) = post* [Fait j[) . 

3 Abstractions and Invariants 

Invariants Let S = (X\Init^ R) be a transition system. We say that p C X is an 
invariant of S', if Reach(S) C p. Checking that p is an invariant of S consists in 
finding an auxiliary invariant such that postj^{fi) C t/;, I nit C t/;, and F p. 

Of course, one possible is the set Reach{S) itself. However, it is not always 
possible to have an effective way to construct a representation of Reach(S) in 
a theory where Reach(S) C p can be checked effectively. Alternatively, one can 
use abstractions to prove invariance properties on abstract models for which the 
set of reachable configurations can be computed in such a theory. This set can 
also be used to define an auxiliary invariant at the concrete level. 

Abstractions Consider two transition systems (i 7 i, ini^, A^^) with i = 1,2. A 
Galois connection ^ between A\ and X2 is a pair (o;, 7) of functions a : 2 ^^ — 2^^ 
and 7 : 2^^ ^ 2^1 such that o;(Ai) C X2 iff Xi C 7(^2), for every Xi C Si. 
We call a the abstraction function and 7 its concretization. 

Given a Galois connection (o;,7), we say that S2 is an abstraction of S^i, 
denoted by Si E(a,7) *^2? if a(Initi) C Init2 and a o postj^^ 07 E postp^^. We 
say also in this case that S2 (o;, 7)- simulates S^i. We write □ S2 {S2 simulates 
S^i) if there exists a Galois connection (o;,7) such that E(a,7) ^2- 

An efficient way to describe Galois connections consists in giving a total 
relation p F X\X A2. Indeed, it is shown in [ 23 ] that such a relation induces the 
Galois connection [post^, prCp). It is easy to check that in case p is a function, 
the concretization pre^ coincides with pre^, which we will tacitly denote by p~^. 
In the sequel, we will write S' Ep Sfi instead of S E(j905^ pfe ) Moreover, 

we will also refer to p C Ai x i72 as the abstraction relation (function) as the 
distinction between p and post^ is easily made from the context. 

^ The use of Galois connections and abstract interpretation as a general and unifying 
framework for abstraction techniques has been first proposed in [13]. 
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Checking invariance properties under abstraction It can be shown that if Si E(a,7) 
S2 then Ueach{Si) C j(Ueach{S2)). Hence, if E(a,7) ^2 and Reach(S2) E 
then Reach(Si) C 7(c/?2)- Therefore, in order to check that Reach(Si) C for 
some Vi E it suffices to find a Galois connection (o;,7) and a system S2 such 
that S\ E(a,7) *5^2, and to check postj^^{'j{Reach{S2))C\(fi) C (fi. Notice, that the 
last condition holds immediately in case j{Reach{S2)) E since E(a,7) ^'2 
implies postj^^{j{Reach{S2))) E j{Reach{S2)). This is the standard preservation 
result concerning invariance properties [ 12 , 23 ]. Notice also, that in case pi is an 
invariant of S^i, there must exist S2 and (o;,7) which fulfill the conditions above. 

Since function composition is monotone and E is transitive, S\ E(a,7) ^2 and 
S2 E(a^77 Ss implies E(a^oa,7077 *5^3. Then, we can consider a hierarchy of 
abstractions, that is, a sequence E(ai,7i) S2 • • • E(«^,7^) with n > 1, in 

order to check properties at different levels of abstraction, and derive auxiliary 
invariants (for every i, 71 o • • • o ji(Reach{Si)) is an invariant of 5 i). 

4 Computing Abstract Models 

Given a system of extended automata A = Ai || • • • || An and an abstraction 
relation p C U x 2 Ja, we want to construct an abstract system Ap = Ai || 

• • • II such that Sa Ep S^p- For that, we adopt the method presented in 
[ 5 ], which consists in considering separately each concrete transition Tc, and 
construct its corresponding abstract transitions Tq. This is achieved by starting 
from the universal relation between abstract states and eliminating transitions 
that do not correspond to concrete ones. Given two abstract states ai and 02 
and a concrete transition Tc of A, if the condition 

p-\ai) ^pr^X^p-\a2)) ( 1 ) 

holds, then the transition Ta between ai and U2 is removed. In the existing im- 
plementation of this method in the InVeSt tool, condition (1) is checked using 
PVS. In order to enhance the efficiency of the method, it is safe to partition 
the abstract variables and to compute an abstraction transition for each parti- 
tion separately. The global abstract transition is then obtained by conjunctively 
composing these abstract transitions. An interesting property of this technique 
is that it allows to deal with variables from different types separately, and use for 
each of these types specific methods and decision procedures to check Condition 
( 1 ). Moreover, it allows to deal with abstract relations that behave as the iden- 
tity on variables that range over infinite domains. This is necessary if we want 
to abstract for instance an extended automaton with counters and fifo channels, 
without abstracting the channels (we abstract counters and get an unbounded 
fifo-channels system). 

Now, to compute abstract transitions, we need to compute abstractions of 
concrete guards, as well as concretizations of abstract states. In case the ab- 
straction relation is given by a predicate, these operations involve quantifier 
elimination which can be computationally costly, when possible. Therefore, we 
only consider abstraction functions which are given by an expression of the 
form:/\^^^ ^ cii = expij) where ai, • • • , are the abstract variables 
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and the partition A, for every i = 1, • • • , n, and the and expij^s only 

involve concrete variables. Moreover, we require that for every literal I occurring 
in a guard of some transition of the concrete system there exist i and j such 
that (fij is 1. This ensures that we can compute an over- approximation of the 
abstraction of a guard simply by substituting every literal by its corresponding 
function = expij. 

Let us now consider the concretization function. If the abstraction relation p 
is given in the form above, then it is a total function and = pre^{(pA) = 

is easily computed by AILi P>A[expij /ai]). 

Now, given a concrete model, we can compute an abstraction function p which 
satisfies the requirements above by introducing an abstract boolean variable 
ai for each literal I occurring in some guard and defining p by the formula 
/\^ a/ = 1. In order to avoid an explosion in the number of abstract variables, 
literals that refer to the same set of concrete variables are checked whether they 
build a partitioning of S. In case n literals hx ' ' Jn build such a partitioning, a 
single abstract variable ranging over {1, • • • , n} is introduced instead of n boolean 
variables. Moreover, it is also possible to consider the predicates occurring in the 
assignments to obtain a finer abstraction function. 



Abstracting queues 

The accuracy of the abstract model obtained by applying the method presented 
above strongly depends on the proof strategy used to check condition (1). In our 
experience, the use of the decision procedures and proof strategies of PVS leads 
to reasonable results unless recursive functions and recursive data types are used. 
Now, since queues range over lists of values, it seems to be natural to encode 
them using lists and define abstractions on them using recursive functions. This, 
however, may lead to unnecessarily cumbersome definitions and require ad-hoc 
proof strategies as is the case for the Alternating Bit Protocol (ABP) and the 
clever abstraction used by Muller and Nipkow [25]. 

This abstraction is based on the observation that the content of the channels 
is always of the form Hence, if a finite alphabet of messages is considered, 

one can obtain a finite state abstraction of the ABP by merging adjacent identical 
messages. 

We have specified the ABP in the specification language of PVS. We specified 
channels as variables of type list over Mes^ where Mes is a finite set of messages. 
Sending and receiving messages are then specified using list operations as car, 
cdr, and cons. Muller and Nipkow’s abstraction can then be specified using a 
recursively defined function. Using InVeSt and the proof strategies of PVS, we 
have computed a finite abstraction of ABP for Mes = {0,1}. The difficulty in 
this exercise has been to find a suitable proof strategy that handles the involved 
lists and the recursively defined functions. If we had used regular expressions, 
we could have specified the abstraction function very easily and we could have 
constructed the abstract system fully automatically without providing any par- 
ticular strategy using decision procedures on regular languages. 
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5 Embedding Regular Languages in InVeSt-PVS 

To be able to efficiently handle in InVeSt systems with fifo-channels we introduce 
PVS-theories for queues and regular languages, and we embed automata- based 
decision procedures. These extensions allow us to naturally represent sets of 
contents (sequences of messages) of fifo channels by means of regular expressions, 
which simplifies the definition of abstractions functions on queues as well as the 
construction of the corresponding abstract models. 

5.1 Extending the specification language of PVS 

A theory for queues is introduced on the theory of finite sequences which is 
pre-defined in PVS. This new theory includes the definition of a polymorphic 
type queue[Mes] and the definitions of the polymorphic functions add^ front and 
remove. Using these functions, sending a message (resp. receiving a message) can 
be specified by the guarded commands true — ^ c := add{rn^ c) (resp. front(c) = 
rn ^ c := remove{c)) . 

Then, we introduce a theory to deal with extended regular expressions with 
the standard operations (Kleene star, concatenation, union), as well as positive 
Kleene star (•+), intersection, complementation and right- quotient (*“^). This 
theory allows to express language constraints like c e L where c is a queue 
variable and L a language given by an extended regular expression on the set 
of message symbols Mes. Using this theory, we can specify abstraction functions 
on queues by a formula of the form: /\^^i c E Li ^ ca = cii^ where Chcont^ = 
{ai, . . . , Un} is a finite set of abstract values, ca is the abstract variable associated 
with the queue c, and Li, • • • , are regular expressions which form a partition 
of Me5*. For example, the Miiller-Nipkow abstraction function on the channels 
of the ABP (see section 4) can be defined straightforwardly using this theory 
(c G 0+ • 1+ ^ Ca = 01 A c G • 0+ ^ Ca = 10 A • • •). Notice that our 
PVS theory on regular expressions is also useful for the representation of sets of 
reachable states calculated symbolically (see Section 6). 

5.2 Calculating the abstract operations on a fifo-channel 

Let Mes be a set of messages, let c be a queue variable, and let p be its abstraction 
function defined by a formula as above. Following the method described in Sec- 
tion 4, given an operation op on c (i.e., op G {c!m, c?m | rn G Me^}), we compute 
the abstract operation op a by checking conditions of the form (1), which is equiv- 
alent to check the complementation-free condition p~^{ca = o^i)^pTeQ^{p~^ {ca = 
Uj)) = 0. By definition of p, p~^{ca = o^i) corresponds to Li (represented in our 
PVS theory by the predicate c G Li). Hence, the condition above is equivalent 
to Li n pre^p{Lj) = 0. It easy to see that prOop can be defined in terms of basic 
operations on regular languages: prec\m{L) = L • m~^ and precim{L) = m - L. 
Hence, checking the condition above consists in checking emptiness problems on 
regular languages, which can be done automatically by invoking decision proce- 
dures for extended regular expressions. For that, we have connected InVeSt to 
the tool Amore [24] which can handle regular expressions and decide problems 
like emptiness, inclusion, etc. 
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We repeated the ABP example using our PVS theories on queues and regular 
expressions, and the InVeSt- Amore connection. Not only writing the abstraction 
function is significantly simpler within this framework, but also the time for 
computing the abstract models reduced from ^ 45 minutes to 11 seconds. 

6 Reachability Anadysis: The tool Lcs (Lossy Channel 
Systems) 

6.1 Computing reachability sets 

The set Reach(A) of any lossy fifo-channel automaton is recognizable but not 
effectively constructible (there is no algorithm allowing to compute a represen- 
tation of this set for any A) [11]. Hence, we adopt a semi-algorithmic approach 
based on computing successively representations of an increasing sequence of 
(lower) approximations of Reach{A)^ by adding at each step the immediate 
successors (post-images) of the configurations computed so far, and using accel- 
eration techniques [22,13] in order to enhance the chance of convergence. When 
our procedure terminates, it delivers a structure representing precisely the set 
Reacfi{A). The acceleration principle we adopt is based on computing in one 
step the effect of executing a control loop an arbitrary number of times (control 
loops are considered as meta-transitions [8,7]). 

To realize this approach, we need symbolic representation structures of sets 
of configurations which allow finite representations of the infinite sets we are 
interested in, which are effectively closed under union and post, which have a 
decidable entailment problem, and moreover, which allow the computation and 
the representation of images by meta-transitions (the effects of control loops). 
Another important feature of such a representation structure is to be normaliz- 
able formal, i.e., for every representable set, there is a unique normal (or canon- 
ical) representation which can be derived from any alternative representation. 
Indeed, all operations (e.g., entailment testing) are often easier to perform on 
normal forms. Furthermore, normality (canonicity) often corresponds to a notion 
of minimality, which is crucial for practical reachability analysis procedures. 

In [2], we have introduced a symbolic representation structure based on a 
class of regular expressions called SRE’s, for use in the computation of the sets 
of reachable configurations of lossy fifo-channels automata. An SRE is either the 
empty set 0 or a finite union of products, each of these products is either an 
empty string e or a finite concatenation ei • • • of atomic expressions which 
can be either of the form (a + e), or (ai + • • • -h a^)*. We showed in [2] that 
SRE’s satisfy all the needed features mentioned above: they characterize exactly 
the class of reachability sets of lossy channels automata, and there are simple 
and efficient procedures (polynomial) for normalization, for entailment testing, 
for computing post-images, and for computing the effect of any control loop. 

Based on the results of [2], we have implemented in a tool called Lcs [1] a 
procedure for computing reachability sets using the following principle: Given 
a parallel composition of a set of fifo-channel automata, the procedure starts 
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from the initial configurations and constructs the set of all reachable configu- 
rations by applying a depth-first-search strategy through a symbolic transition 
graph. The nodes of this graph are symbolic states^ i.e., representations of sets 
of configurations as pairs of the form (q^E)^ where g is a control location and 
E is an SRE-based representation of the contents of the channels: If the system 
has n channels, E is a finite set {{e\, , e^), . . . , (e^^, . . . , ej^)} of n-dim vectors 
of SRE’s representing the set [Ej = [^il X • • • x [e^ , where |ej denotes 
the language described by the expression e. At each step, the procedure com- 
putes the immediate successors (post-images) of the current symbolic state by 
all possible transitions of the automaton, and considers them according to the 
depth- first-search ordering. When a control loop is detected, an acceleration is 
performed by computing the effect of iterating the considered control loop on 
the current symbolic state. The set of encountered configurations is memorized 
progressively. After computing the post-image of a symbolic state, the procedure 
checks whether the obtained symbolic state is covered by the set of configura- 
tions computed so far. If this is the case, the successors of this symbolic state 
are not generated. Notice that this procedure can also be used for on-the-§ 
verification of safety or invariance properties. 

6.2 Constructing Finite Abstract Models 

Computing the set of reachable configurations can be used to generate finite ab- 
stract models. Let ^ be a fifo-channel automaton. Let ^ be a finite set of symbolic 
states of A (see definition in the previous paragraph). Then, the symbolic graph 
associated with ^ is the finite-state transition system Q<^ = (^,init#,R#) such 
that lnit<p is the set of all symbolic states containing the initial configuration 
initj,, and V^i,^2 € <i>iE<p(j)2 iff G <j)i,(J2 € <i>2- 

The canonical symbolic graph of A corresponds to the partition of Reach(A) 
according to the control states, i.e., = {(gi,E\), . . . , {qm,Em)} where Q = 

{^1, • • • , and Reach{A) = [j^i{qi} x {Eij. It is easy to see that for every 
set of symbolic states which covers Reach{A)^ i.e., Reach{A) C 0 , we have 

Sa E G4> {G4> simulates 5 ^). This fact holds in particular for the canonical sym- 
bolic graph , as well as for all the symbolic graphs obtained from refinements 
of the partition 

The tool Lcs allows the automatic construction of the canonical symbolic 
graph of a given fifo-channel system. The construction of this graph is done 
during the construction of the reachability set. 

Example: The Alternating Bit Protocol can be modeled by two automata, a 
sender and a receiver, communicating through two lossy channels K and L. We 
applied to the ABP the procedure of the Lcs tool which has terminated and 
generated the set of reachable configurations given in the table 1 , as well as the 
canonical symbolic graph. The execution time is 0.07 seconds (UltraSparc). This 
symbolic graph is then reduced after hiding all internal actions to a cyclic graph 
with two transitions, a SND followed by a RCV, which shows that the protocol 
behaves as a one-place buffer. 
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Table 1. Reachability set of the ABP 



7 Combining Abstraction and Reachability Analysis 

7.1 Prom the Concrete to the Abstract • • • 

The first possible combination of abstraction and reachability analysis is to apply 
these two techniques sequentially. Given an heterogeneous model A of a system, 
say a parallel composition of extended automata with counters and fifo-channels, 
the first step consists in applying an abstraction Ap in order to get a (unbounded) 
fifo-channels system, and then the second step consists in applying the symbolic 
reachability analysis method in order to, either check directly (on-the-fly) an 
invariance property on the abstract fifo-channel model Ap^ or to generate a finite 
abstraction of this model which can be used for finite-state model-checking. 

We illustrate this approach on the example of the Bounded Retransmission 
Protocol (BRP). Detailed descriptions of the BRP can be found in [21,20,16]. 
The BRP is a data link protocol whose service consists in transmitting large 
files (sequences of data of arbitrary lengths) from one client to another one. 
Each datum is transferred in a separate frame. Both clients, the sender and 
the receiver, obtain an indication whether the whole file has been delivered 
successfully or not. We model this protocol by means of two automata, a sender 
and a receiver, communicating through two lossy channels K and L. The BRP 
can be seen as an extended version of the ABP. However, one of the specific 
features of the BRP is that the model of the sender uses integer counters: First, 
it has a counter which gives the index i of the current frame in the considered 
file. This counter allows to know whether the current frame is the first one, 
the last one, or some intermediate frame. When the sender does not receive an 
acknowledgment, it may resend the same message up to a maximal number of 
retransmissions MAX which is a parameter of the protocol. Hence, the sender 
uses a counter CR for counting retransmissions. 

Starting from this model, we use InVeSt-PVS to generate automatically an 
abstract model which is an unbounded fifo-channel system. For that, we consider 
an abstraction function on the integer variables and parameters. This abstraction 
function is defined automatically from the guards appearing in the model (as 
shown in section 4). Then, the corresponding abstract unbounded fifo-channel 
model is computed automatically and compositionally. The execution times for 
computing the abstract sender and receiver are 64.71 and 10.47 sec. 

Then, in a second step, we apply the Lcs tool to the obtained abstract 
fifo-channel model. The Lcs tool constructs automatically the set of reachable 
configurations and the canonical symbolic graph. The execution time for these 
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operations is 0.56 seconds (UltraSparc). After hiding internal actions and mini- 
mization, we obtain a finite transition system (5 states and 10 transitions) which 
is used to model-check service properties of the BRP such as: between two con- 
secutive requests, the sender and the receiver must deliver indications of success 
or failure to their clients, the receiver delivers a failure indication only if an 
abortion (by the sender) has occurred, etc. 

7.2 • • • and Back 

Strengthening of Invariants Suppose that we want to show that all configurations 
of A satisfy a property (f expressed as a predicate on the variables of A. Then, 
given an abstraction function p, we can consider the corresponding abstract 
model Ap and compute Reach{Ap). Since, p~^ {Reach {Ap)) is an invariant of A 
(see Section 3), to solve our verification problem on A it suffices to show that 
postji{p~^{Reach{Ap)) A cp) ^ cp. 

Let Reach{Ap)) = . . . , e^) | i = l,...,m}. Notice that the e]’s 

may be empty sets (when the qfs are not reachable). Notice also that con- 
trol locations in Ap correspond to finite abstractions of variables (e.g., coun- 
ters) in A. Then, the concretization p~^ {Reach {Ap)) is given by the formula 
ili) ^ Aj=i ^ written in PVS using our theory 

on regular expressions. Now, it is worth to notice that in general the formula cp 
we want to check does not constrain the contents of the channels. The conjunc- 
tion of the formula above with cp allows to strengthen this formula according 
to the fact that some control locations in Ap are not reachable and expresses 
constraints on the variables of A that are not channels. 

Generating and Reusing Abstraction Functions Safety properties can be ex- 
pressed by means of observers. An observer O is an extended automaton which 
runs in parallel with the system and observes its behaviors without interfer- 
ing with them. Then, invariance properties can be checked on the synchronous 
parallel composition ^ x O of the system and its observer. In general, we may 
consider the same system Al with several observers. Then, it is interesting to com- 
pute informations about A that can be reused each time we consider a composed 
system A x O. In particular, we can use our symbolic reachability analysis of 
fifo-channel systems in order to derive informations on the contents of the chan- 
nels of A (notice that usually, observers are not fifo-channel systems since they 
represent service properties. However, they may have unbounded local variables 
like counters). If A is itself an heterogeneous system, to obtain the information 
about the contents of the channels of Al, we start by applying an abstraction in 
order to get a fifo-channel system Ap and we compute Reach{Ap). Then, the 
information we compute allows to define once and for all an abstraction function 
on the channels which can be used, each time we consider an observer O, in the 
definition of a finite abstraction of the system A x O. 

Indeed, given a description of the set of reachable configurations of a fifo- 
channel systems Ap, we can define systematically an abstraction function on 
its channels: Let Reach{Ap)) = {(g^, . . . , e^) | i = 1, . . . , m}. Then, for each 
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i = 1, . . . n, let = {e^, . . . , e^}, and let IZi = e] (the set of all possible 
contents of q). We let Fi denote the coarsest partition of IZi which is compatible 
with the collection Ci (i.e., Ve G Q. . . . ,pk e Fi. e = pi U . . , Up^) and we 
consider a finite set of abstract contents Ai = {ap \ p E Fi}. Then, we define the 
abstraction function pi : T* ^ U {_L} such that: Ww G T*. (Apep^ w e p ^ 
Pi{w) = ap) Aw ^TZi ^ Pi{w) =_L. 

Notice that the abstraction functions we generate this way can be written in 
the PVS language using our theory on regular expressions, and hence, they can 
be composed with other abstraction functions concerning other variables. 

As an illustration, consider again the example of the ABP. Starting from the 
set of reachable configurations given in Table 1 which was computed by the Lcs 
tool, we generate using the definition above abstraction functions px and pL for 
the channels K and L. These two functions are equal and coincide exactly with 
the Muller-Nipkow abstraction function (see Sections 4 and 5). Based on this 
abstraction function, we can define an abstraction function of the ABP composed 
with an observer which checks that the input and output streams coincide. 

8 Conclusion 

We have developed a methodology for verifying infinite-state systems by com- 
bining automatic abstraction techniques and symbolic reachability analysis pro- 
cedures. We have illustrated the application of this methodology on the case 
of extended fifo-channels systems. For that, we have extended the tool InVeSt 
by automata-based decision procedures and reachability analysis techniques for 
fifo-channels systems. 

The method we propose for combining abstractions and reachability analysis 
can be applied for any type of variables or combination of (interdependent) types 
corresponding to decidable (mixed) theories, and for which there are symbolic 
representations and procedures for reachability analysis. Hence a crucial issue is 
to identify such decidable theories and the corresponding representation struc- 
tures, and to design efficient symbolic reachability analysis procedures based on 
these representations. Indeed, improving the power and the efficiency of these 
procedures allows to simplify the needed abstraction steps. 

References 

1. P. Abdulla, A. Annichini, and A. Bouajjani. Symbolic Verification of Lossy Channel 
Systems: Application to the Bounded Retransmission Protocol. In TACAS’99. 
LNCS 1579, 1999. 149, 154 

2. P. Abdulla, A. Bouajjani, and B. Jonsson. On-the-fly Analysis of Systems with 

Unbounded, Lossy Fifo Channels. In CAV’98. LNCS 1427, 1998. 147, 149, 154, 

154, 154 

3. P.A. Abdulla and B. Jonsson. Verifying Programs with Unreliable Channels. In- 
form. and Comput., 127(2):91-101, 1996. 147 

4. R. Alur, C. Courcoubetis, N. Halbwachs, T. Henzinger, P. Ho, X. Nicollin, A. Oliv- 
ero, J. Sifakis, and S. Yovine. The Algorithmic Analysis of Hybrid Systems. TCS, 
138, 1995. 147 




Verification of Infinite-State Systems 159 



5. S. Bensalem, Y. Lakhnech, and S. Owre. Computing Abstractions of Infinite State 
Systems Compositionally and Automatically. In CAV’98. LNCS 1427, 1998. 147, 
148, 151 

6. S. Bensalem, Y. Lakhnech, and S. Owre. InVeSt : A Tool for the Verification of 
Invariants. In CAV’98. LNCS 1427, 1998. 147, 148 

7. B. Boigelot and P. Godefroid. Symbolic Verification of Communication Protocols 
with Infinite State Spaces using QDDs. In CAV’96. LNCS 1102, 1996. 147, 154 

8. B. Boigelot and P. Wolper. Symbolic Verification with Periodic Sets. In CAV’94- 
LNCS 818, 1994. 147, 154 

9. A. Bouajjani, J. Esparza, and O. Maler. Reachability Analysis of Pushdown Au- 
tomata: Application to Model Checking. In CONCUR’97. LNCS 1243, 1997. 147 

10. A. Bouajjani and P. Habermehl. Symbolic Reachability Analysis of FIFO-Channel 
Systems with Nonregular Sets of Configurations. In ICALP’97. LNCS 1256, 1997. 
147 

11. Gerard Cece, Alain Finkel, and S. Purushothaman Iyer. Unreliable Channels Are 
Easier to Verify Than Perfect Channels. Inform, and Comput., 124(1) :20-31, 1996. 
154 

12. E.M. Clarke, O. Grumberg, and D.E. Long. Model checking and abstraction. ACM 
TOPLAS, 16(5), 1994. 147, 151 

13. P. Cousot and R. Cousot. Static Determination of Dynamic Properties of Recur- 
sive Procedures. In IFIP Conf. on Formal Description of Programming Concepts. 
North-Holland Pub., 1977. 147, 150, 154 

14. P. Cousot and N. Halbwachs. Automatic Discovery of Linear Restraints among 
Variables of a Program. In POPL’78. ACM, 1978. 147 

15. D. Dams, R. Gerth, and O. Grumberg. Generation of Reduced Models for Checking 
Fragments of CTL. In CAV’93. LNCS 697, 1993. 147 

16. P. D’Argenio, J-P. Katoen, T. Ruys, and G.J. Tretmans. The Bounded Retrans- 
mission Protocol must be on Time. In TACAS’97. LNCS 1217, 1997. 156 

17. A. Finkel, B. Willems, and P. Wolper. A Direct Symbolic Approach to Model 
Checking Pushdown Systems. In Infinity ^97^ 1997. 147 

18. S. Graf and C. Loiseaux. A Tool for Symbolic Program Verification and Abstrac- 
tion. In CAV’93. LNCS 697, 1993. 147 

19. S. Graf and H. Saidi. Construction of Abstract State Graphs with PVS. In CAV^97^ 
volume 1254 of LNCS, 1997. 147 

20. J-F. Groote and J. Van de Pol. A Bounded Retransmission Protocol for Large 
Data Packets. In AM AST ’96. LNCS 1101, 1996. 156 

21. L. Helmink, M.P.A. Sellink, and F. Vaandrager. Proof checking a Data Link Pro- 
tocol. In Types for Proofs and Programs. LNCS 806, 1994. 156 

22. R.M. Karp and R.E. Miller. Parallel Program Schemata: A Mathematical Model 
for Parallel Computation. In Switch, and Automata Theory Symp. IEEE, 1967. 154 

23. C. Loiseaux, S. Graf, J. Sifakis, A. Bouajjani, and S. Bensalem. Property Preserv- 
ing Abstractions for the Verification of Concurrent Systems. FMSD, 6(1), 1995. 
147, 150, 151 

24. Oliver Matz, Axel Miller, Andreas Potthoff, Wolfgang Thomas, and Erich Valkema. 
Report on the Program AMoRE. Technical Report 9507, Inst. f. Informatik u. 
Prakt. Math., CAU Kiel, 1995. 149, 153 

25. O. Muller and T. Nipkow. Combining Model Checking and Deduction for I/O- 
Automata. In TACAS’95. LNCS 1019, 1995. 152 

26. S. Owre, J. Rushby, N. Shankar, and F. von Henke. Formal verification for fault- 
tolerant architectures: Prolegomena to the design of PVS. IEEE Transactions on 
Software Engineering, 2 1(2): 107-125, Feb. 1995. 148 




Experience with Predicate Abstraction’^ 



Satyaki Das^, David L. DilD, and Seungjoon Park^ 

^ Computer Systems Laboratory, Stanford University, Stanford, CA 94305 
^ RIACS, NASA Ames Research Center, Moffett Field, CA 94035 



Abstract. This reports some experiences with a recently-implemented 
prototype system for verification using predicate abstraction, based on 
the method of Graf and Saidi [9]. Systems are described using a language 
of iterated guarded commands, called Murcj} (since it is a simplified 
version of our Mur^ protocol description language). The system makes 
use of two libraries: SVC [1] (an efficient decision procedure for quantifier- 
free first-order logic) and the CMU BDD library. The use of these libra- 
ries increases the scope of problems that can be handled by predicate 
abstraction through increased efficiency, especially in SVC, which is ty- 
pically called thousands of times. The verification system also provides 
limited support for quantifiers in formulas. The system has been applied 
successfully to two nontrivial examples: the Flash multiprocessor cache 
coherence protocol, and a concurrent garbage collection algorithm. Ve- 
rification of the garbage collector algorithm required proving properties 
simple of graphs, which was also done using predicate abstraction. 



1 Introduction 

Abstraction is emerging as the key to formal verification of large designs, espe- 
cially designs that are not finite-state. Predicate abstraction^ first described by 
Graf and Saidi [9], provides a means for combining theorem proving and model 
checking techniques by automatically mapping an unbounded system (called the 
concrete system) to a finite state system (called the abstract system). The states 
of the abstract system correspond to truth assignments to a set of predicates. 
The user must supply the predicates and properties to be proven. The system 
automatically model checks the properties on the abstract system defined by 
the predicates. The abstraction is conservative^ meaning that if a property is 
shown to hold on the abstract system, there is a concrete version of the property 
that holds on the concrete system; however, if the property fails to hold on the 
abstract system, it may or may not hold on the concrete system. 

We have recently implemented a prototype system for efficient verification 
of invariants by predicate abstraction, to discover how far predicate abstraction 
can take us towards the goal of formal verification of real systems. Results have 
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and NASA contract NASI-98139. The content of this paper does not necessarily 
reflect the position or the policy of the Government and no official endorsement 
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been encouraging. Systems are described using a language of iterated guarded 
commands, which we call Mur(j)~~ (since it is a simplified version of our Mur0 
protocol description language). The system makes use of two libraries: an efficient 
decision procedure for quantifier-free first-order logic, called SVC [1], and the 
CMU BDD library written by David Long. The use of these libraries increases 
the scope of problems that can be handled by predicate abstraction through 
increased efficiency, especially in SVC, which is typically called thousands of 
times. The prototype verifier is written in Common Lisp, and the libraries (which 
are written in C and C++) are called via the “foreign function” interface. 

We have applied it successfully to two nontrivial examples: the Flash mul- 
tiprocessor cache coherence protocol, and a concurrent garbage collection algo- 
rithm. In verification, discovering strategies for effective use of a tool is often as 
important as the design of the tool. We quickly found that we needed limited 
support for quantifiers, for expressing properties of unbounded numbers of pro- 
cesses and data. For the garbage collection algorithm, it was necessary to prove 
some properties of a recursive function. Interestingly, some recursive algorithms 
can be verified by translating them to Mur(f> and using predicate abstraction. 

The more detailed description below has programs written in a syntax other 
than Muf(j) , and logical formulas in a syntax other than SVC. The benefits of 
readability were deemed to outweigh the possibility of translation errors. 



Related Work 

Our work is derived from the Graf/Saidi abstraction scheme [9]. However, the ori- 
ginal implementation represented the abstract state space as a set of monomials 
(a monomial is a product of Boolean variables and negated variables). Instead, 
we use BDDs, which usually represent Boolean functions more efficiently. Ho- 
wever, Graf and Saidi also sacrificed some accuracy by representing the image 
of a monomial under a transition rule as a single monomial which must cover 
all of the states in the image of the transition rule. Our method has no such 
restriction. So, our verifier is more accurate, but may require more computation 
(which is performed more efficiently). 

Our approach to handling parameterized systems uses quantified formulas, 
(similar to [17] and [13]), which differs from the method presented in [12]. They 
used linear systems of equations to deal with state transitions. The basic idea is 
that for each state there is an abstract variable which keeps track of the number 
of processes in that state. So if a process moves from q to then the value of 
Xq is decremented by one while Xq^ is incremented by one. We have handled 
reasoning about parameterized systems by introducing formulas quantified over 
the replicated processes as abstract state variables. This is similar to what was 
proposed in [8] and [7]. 

Another approach to generating abstract state graphs is to abstract the con- 
crete rules [3]. This has the advantage of requiring fewer validity checks (as they 
are required when constructing the abstract transitions). However, abstracting 
the rules may also lose more information about the concrete system, and so 
might be unable to prove the invariant of interest. 
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2 Predicate Abstraction 

This section summarizes the theory of predicate abstraction and its implemen- 
tation in the prototype verifier. The notation is somewhat different from Graf 
and Saidi’s, but everything is very similar until the details of the computation 
of the successors of a set of abstract states (the recursive decomposition). 



The Concrete and Abstract Descriptions 

As with previous work in this area, the concrete system is modeled as a collection 
of iterated nondeterministic commands. There is a single global state variable X 
that represents the complete state of the system. Multiple state variables can be 
represented by making them fields of a variable of record type. The initial state 
of the concrete system is generated by an assignment X := init[X) ^ 

There is a set of transition rules. Each rule defines a transition function / 
which maps states to states (the input language has guarded commands, but the 
guards are not necessary since the transition functions can be defined to leave 
state variables unchanged when their guards are not satisfied). 

An execution of the system is a sequence of states, • • - qn^ qn^i^ • • where 
qo = init[q-i) for some arbitrary state g_i (note that g_i does not occur in the 
execution sequence) and = f{qn) for some transition function /. A concrete 
state q is reachable if it appears in some execution sequence. We are interested 
in whether predicates on the state variables are invariants^ meaning that they 
hold for every reachable state of a concrete system. 

An abstract system is defined by a concrete system and a set of N predicates, 
01,02, ••• Each state qA of the abstract state space is a truth assignment 
to the indices 1 through N (so the set of states is finite). The predicates define 
an abstraction function^ o;, which maps concrete states to abstract states. In 
particular, qA = ot[qc) whenever Vi : qA{i) = 4^i{qc)- An abstract state c[a E 
reachahle if it is an abstraction of a reachable concrete state qc^ 

The reachable state space can be used to check invariants. If the user knows 
what invariants he or she wants to prove, these invariants are supplied as some 
of the predicates 0^ (actually, the invariant may sometimes be decomposed into 
a conjunction of simpler properties). If qA{i) is true in all reachable abstract 
states, the invariant has been proven. In addition, a BDD describing the abstract 
reachable state space can be converted into an invariant for the concrete state 
space by concretizing it, as described below. 



Approximating the Abstract Reachable State Space 

Sets of abstract or concrete states are represented using logical formulas. Ab- 
stract states are represented using BDDs, which can be regarded as propositional 

^ Initialization depends on the values of the state variables, which are unconstrained, 
so as to allow nondeterministic choice of start states. The initialization rule is also 
conveniently similar to the transition rules of the system. 
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formulas, by associating Boolean variables 5 i, . . . , Bn with the truth values of 
the corresponding predicates. The concrete domain is not necessarily finite, so 
the concrete state space is represented using first-order formulas. 

If Sc is a set of concrete states, o;(sa) will be taken to be {o;(gc') | Qc ^ sc}^ 
The concretization function 7 is the inverse image of a: 7(5^) = {qc \ Oi[qc) G 
sa}- Note that : sc Q j(a(sc))- If t^A is a propositional formula (e.g., a 
BDD) over the variables Bi representing the set sa, a first-order formula fjc 
representing 7(5^) can be computed by substituting each predicate fi for Bi in 

An approximation of the reachable state space of the abstract system is com- 
puted by (the usual) breadth-first symbolic traversal. At any time, the algorithm 
has a BDD representing the current abstract reachable set. Initially, this formula 
represents an abstraction of the initial states of the concrete system. Then, the 
algorithm iteratively computes an over-approximation of the set of all successors 
of the current reachable set. At the end of the next iteration, the formula is the 
logical disjunction of the formula for the current reachable set and the formula 
for its successor set. 

The key step in this procedure is how to find the formula for the set of suc- 
cessors. Given a BDD 7/;^ which characterizes sa, find a BDD characterizing 
the successors of in the abstract system. It is sufficient to compute the succes- 
sors contributed by each concrete transition function /, since the set of abstract 
successors is the union of the successors contributed by the individual functions. 
The formula for the initial abstract states is computed by finding the possible 
successors of the entire state space under the “transition function” init (in other 
words, finding the formula for the successors of true under init). 

The abstract successors are computed by a method similar to that of Graf 
and Saidi, but using recursive subdivision of the concrete state space. The first 
step computes = 7(t^A) by substitution (as described above), represents 
the set of all states that could abstract to a state in 

We assume that each transition function / can be written as a first-order 
term, which is also name /. Predicates 4 >i{x) that characterize the sets {qc \ 
4 >i{f{qc)){ can be pre-comp uted by substituting the term /(A) for X in (f>. 
Intuitively, 4>[{x) means “x is a predecessor of a state that can satisfy 

We compute by recursive case splitting on each bit Bi in the abstract 
formula, in ascending order of i. 



H (7/;, m) 



f Bm A HC 1) 

V ~'Bm A HC A Am) ra+ 1) 
true 
I false 



if 0 < ra < N 

ifm = ATlA7/;is satisfiable 
ifm = A + lA7/;is unsatisfiable 



The formula 7/; is a Boolean combination of predicates fi for m<i<N-\-fXfs 
is the set of concrete states represented by 7/;, the function ii, below, computes 
a logical formula representing the set of abstract states a[f{s)). If m < A, it 
splits s into two parts, and s'f by conjoining with fi and then -1^7 H is 
then called recursively to compute a[f{s^)) and a[f{s^^)). When rn = A + 1, 
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every (f>i has been assumed true or false in t/;, so t/; is equivalent to one of these 
values. 

Several important optimizations are not shown. First, H [false^m) is always 
false ^ so we check whether is satisfiable at each step, using SVC. Second, 
is saved in a table the first time it is computed; this table is checked 
to see if the needed value is available before computing H recursively. Finally, 
the propositional operations are performed using a BDD library, so common 
subexpressions are shared. 



Dealing with Indexed Sets of Transitions 

Muf(j) , like Mur^ before it, allows the user to define a set of transition rules 
that vary over an index variable. There is a construct called a “ruleset,” which 
declares a index variable that can be used in the code for transition rules con- 
tained in the ruleset. This feature is useful for describing collections of nearly 
identical processes. 

Ruleset parameters are encoded as accesses to an infinite array, indexed by 
the natural numbers, whose entries are rule indices. The contents of the array 
are unconstrained, so it serves as a source of nondeterministic choices. The ith 
element of the array is looked up to determine the choice of the transition rule 
to execute in the ith step of a computation. 

Stating properties of parameterized systems requires quantified formulas, but 
SVC can only decide quantifier- free formulas. The prototype verifier copes with 
quantifiers using some simple heuristics: 

— In parameterized processes, the concrete variables associated with each pro- 
cess are frequently stored in an array, so quantified variables are instantiated 
with all array index expressions. 

— Since SVC checks validity, variables that are universally quantified outside 
of the scope of an existential quantifier can be replaced by a fresh symbolic 
constant (which is distinct from all other names in the formula). Instantiation 
of quantifiers with these fresh variables is also useful. 

— As a last resort, the system allows the user to supply hints about how to 
instantiate (and not instantiate) variables. 

These measures are barely adequate; more sophisticated handling of quantifiers 
is required in the future. 



3 FLASH Cache Coherence Protocol Example 

One advantage of predicate abstraction is that it can be used to strengthen 
invariants, automatically. This is potentially valuable, since finding appropriate 
invariants is one of the most difficult aspects of verifying a design using a theorem 
prover. 

This technique was evaluated on a protocol that was previously verified by 
several methods: the Stanford FLASH multiprocessor cache coherence protocol. 
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The model of the cache coherence protocol consists of a set of nodes, each of 
which contains a processor, caches, and a portion of global memory of the sy- 
stem. Each cache line-sized block in memory is associated with a directory header 
which keeps information about the line. The state of a cached copy is in either 
invalid^ shared (readable), or exclusive (readable and writable). The distribu- 
ted nodes communicate using asynchronous messages through a point-to-point 
network. 

This protocol has been verified using an aggregation abstraction with help 
of a theorem prover. This proof required many lemmas that showed that va- 
rious pairs of actions commute (produce the same state, regardless of execution 
order). However, the lemmas don’t hold in arbitrary system states; instead, it 
is necessary to prove an invariant that characterizes the reachable states, then 
prove that the lemma holds given the invariant. Finding this invariant was the 
most difficult part of the proof. A more detailed description of the protocol and 
the proof can be found in [14]. 

To prove the invariants, it is necessary to strengthen them until they are 
inductive (strengthening them is equivalent to finding an induction hypothesis). 
In practice, strengthening an invariant is a trial-and-error process involving re- 
peated failed proofs, from which new properties must be manually extracted. 
This usually requires many iterations, and each iteration is difficult. 

Predicate abstraction makes invariant strengthening easier. The user sup- 
plies plausible properties that might be useful in strengthening the invariant, 
and the system automatically tries various Boolean combinations of these con- 
ditions until it is able to prove the property (or not). This saves the effort of 
trying Boolean combinations by hand. When the abstract reachability analysis 
generates a state where the candidate invariant does not hold, it is possible to 
report an abstract state, along with a concrete transition that enters the state. 
This information may suggest additional predicates that should be added. 

To use predicate abstraction for invariant strengthening, the user starts with 
a description of the system and some (relatively simple) invariants that are 
sufficient conditions to prove the verification conditions of interest. For example, 
a desired property of FLASH was that there be at most one exclusive copy of a 
memory line in the system. To prove this, two predicates were supplied initially: 

— There are no exclusive copies. 

— There is a single exclusive copy 

The invariants discovered using these properties are not strong enough, so two 
more properties were added about the PUTX message, which is a message from 
the directory to the cache that wants an exclusive copy. 

— There are no PUTX replies in the network. 

— There is a single PUTX reply in the network 

The Muf(j) description of the protocol used in this test was somewhat 
different from the PVS description used in the aggregation proof. The first sim- 
plification was modeling the memory as a separate node in the machine, when in 
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fact memory is stored in processing nodes. This simplification was necessitated 
by the inefficient treatment of quantifiers in the current Murcj) prototype. The 
second simplification was the result of a limitation of Murcj) : In the PVS de- 
scription, the directory entry for a memory block maintained a count of sharers 
(read-only cached copies of the memory block). There was no easy way to count 
the number of actual sharers in Murcj) , so this was changed to be the set of 
sharing nodes, instead of a count. ^ In spite of these compromises, we believe 
that the problem of invariant strengthening for the modified FLASH protocol is 
quite difficult, and the ability to solve it with Murcj) indicates that predicate 
abstraction is an effective approach to this problem. 

One of the interesting challenges presented by the FLASH protocol is finding 
invariants for an unknown number of processes. As with the original description, 
the protocol description is parameterized for unknown number of processes. The 
caches are modeled as an unbounded array indexed by node indices. This tends 
to lead to predicates and properties to prove that are quantified over all process 
indices. For instance, the property that there should be no write-back request 
when there exists any exclusive copy of the memory line in the whole system 
can be specified with a universal quantifier as 

Vp : ( cache[p\. state = exclusive ^ netwB = empty ). 

As explained in Section 2, Mur(f> is able to handle quantified predicates, albeit 
sub-optimally, by trying many instantiations without human interaction. This 
capability was critical for completing the proof with reasonable effort. 

Overall, we estimate that finding the invariants with predicate abstraction 
was at least an order-of-magnitude easier than finding them by trial and error 
with PVS. It required no more than five days of user time and two hours of CPU 
time to strengthen the invariants. 



4 Garbage Collection Example 

The most ambitious example we have attempted is the on-the-fly garbage collec- 
tion algorithm, which was first proposed by Dijkstra, et al. [4]. The algorithm is 
widely acknowledged to be difficult to get right, and difficult to prove. A more 
detailed discussion of the subtlety of this algorithm and subsequent variations 
can be found in a paper by Havelund and Shankar [11]. 

An extended version of this algorithm which can handle multiple concurrent 
mutators was used as the garbage collector of Concurrent Caml Light. The proof 
of the safety property required 58 invariants to be proved. Details of the modified 
algorithm and its proof are discussed in [6] and [5]. 

The original algorithm was simplified by Ben-Ari [2] to involve two colors 
instead of three. This also led to a simpler argument of correctness. Alternative 

^ This problem could possibly have been addressed by writing a recursive function to 
count the sharing nodes, then verifying some properties of it as in the proof of the 
garbage collection algorithm. We haven’t tried this yet. 
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justifications of Ben-Ari’s algorithm were also given by Van de Snepscheut [18] 
and Pixley [15]. However, these proofs were informal pencil and paper proofs. 

Later, this modified algorithm was mechanically proved by Russinoff [16] 
using the Boyer-Moore theorem prover. A formulation of the same algorithm 
was also proved by Havelund and Shankar in PVS [10] and [11]. The proofs of 
both [10] and [11] were of approximately the same size. The proofs needed 19 
invariant lemmas and 57 function lemmas and [11] took about two months. So far 
as we know, no one has mechanically proved the original algorithm of Dijkstra, 
et al. 

In the garbage collection algorithm, the collector and the mutator (which mo- 
dels the behavior of the user program by nondeterministically changing pointers) 
run concurrently with both processes accessing a shared memory. The memory 
is abstractly modeled as a directed graph with each node having at most two 
outgoing edges. A subset of these nodes are called roots; they are special in the 
sense that they are always accessible (our proof of the algorithm assumes with- 
out loss of generality that there is only one root node). Any node that can be 
reached from one of the roots by following edges is also accessible. The mutator 
is allowed to choose an arbitrary node and redirect one of its edges to an ar- 
bitrarily chosen accessible node. Each memory node also has a color field which 
the collector uses to keep track of the accessible nodes. The collector adds nodes 
that are not accessible to the mutator, so-called garbage nodes, to a free-list for 
recycling. 

The mutator, which is described in pseudo-code in Figure 1, first redirects 
an edge of an arbitrarily selected accessible node towards an arbitrary accessible 
node {acc{j) says j is accessible). It then colors the second node gray if it was 
white, or otherwise does nothing. Part of the subtlety of the algorithm is that 
the collector can mark nodes between these two steps of the mutator. 

The collector finds the nodes that are not reachable from the roots, so they 
can be added to the free list. It begins by coloring the root nodes gray (“coloring 
a node gray” is called shading^ from now on). Then it iterates through all the 
nodes; whenever it finds a gray node, it shades its successors and colors the node 
black. After this the collector starts this iteration again. The collector algorithm 
is presented in Figure 1. 

The basic property to prove is that the collector does not free an accessible 
node. An extra state variable called error was added to the collector, which is 
set to true if the collector ever frees an accessible node, reducing the desired 
property to an invariant that error is never true. 

Most of the predicates were simply guards from the Murf description of 
the algorithm or derived directly from the invariant to be proved. Some required 
insight, however. Two predicates are needed because, when the collector is in 
the marking phase^ the mutator can change the color of a node to gray, in which 
case there must already exist a gray node yet to be examined by the collector. 



Wx e [ij M) : color [x] ^ gray 
G [0, M) : color\y] = gray 
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/* mutator */ 
while(trwe) 

choose n^k £ [0, M), 
s.t. acc{k) = true 
/* choose to change left or right '' 
[ left[n] := k; q := k 
□ right[n] := k; q := k] 
if color [q] = white 
ccjlcjr[q\ := gray; fi 
end /* while */ 



/* collector */ 
shade all roots; 
error := false; 
i := 0;k := M; 

/* marking phase */ 
do (A: > 0) ^ 
c := color [i] ; 
if c = gray 
k:=M; 

shade left[i]^ right[i]; 
color [i] := black; 

He ^ gray ^ k := k — 1 

fi; 

i := (i 1) mod M 

od 

/* collecting phase */ 

J := 0; 

do [j < M) ^ 
c := color [j] ; 
if c = white 

if acc[j) error := true fi 
append j to free list 
□ c white exjlejr[j] := white 

fi; 

3 := j + 1 

od 



Fig. 1. Mutator and Collector Algorithms 



The correctness of the algorithm also depends on the invariant that a black 
node never has a white successor (except in the transitory case where the mutator 
is about to shade the white successor). 

Vx G [0, M) : [color[x] = black ^ [color [left [x]] ^ white V q = left[x])) 

Vx G [0, M) : [color[x] = black ^ [ color [right[x]] ^ white V q — right[x])) 



Verifying Properties of Graphs 

A major difficulty with verifying the garbage collection algorithm using predi- 
cate abstraction is that its correctness depends on some simple properties of 
graphs that are not easy to prove by simple instantiation of quantifiers (induc- 
tion is actually needed). These properties are given as axioms to the verifier 
when verifying the algorithm, and are proved by using predicate abstraction on 
“auxiliary’’ Murf) programs that compute the graph properties. 
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For example, the following property about the function acc is necessary: 
{color[0] = black) 

A{\/p G [0, M) : color[p] = black . . 

^ {color[left[p\] = black A color [right[p\] = black)) ^ ^ 
^ (Vg G [0, M) : acc{left^ right) (q) ^ {color[q] = black)) 

(The function acc is actually a function of the graph structure of the nodes, so 
left and right are its arguments.) 

Another axiom is says that redirecting an edge to point to an already accessi- 
ble node never makes a previously inaccessible node accessible. In the following, 
write{left^ q^p) represents an array which is the same as left except that it has 
the value p at index q. There is a similar equation for redirecting the right side. 



yp^q^r G [0, M) : [acc{left^righf)[p) A acc{write{left^qpp)^righf)[r)) 

^ acc[left^ right) [r) (2) 

The most difficult property required some insight. It states that if the root 
node of the graph is gray in color and all other nodes are either gray or white 
then, for every accessible white node, there exists a path from a gray node to it, 
entirely through white nodes. 



[color [0] = gray A Vx G [0, M) : color [x] = vjhite A acc[left^ right) [x)) 
^ G [0,M) : color[y] = gray A reachable jwhite{left ^righf)[y ^x) 



( 3 ) 



where reachahlexwhite is a similarly recursive definition that says there is a path 
of all white nodes from left to right 

It is frequently possible to write an auxiliary Murf program that compu- 
tes a graph property, then verify some predicates on this algorithm. The verified 
properties are then used as axioms in the main verification effort. These auxili- 
ary programs are not tricky to write, because they do not require concurrency. 
Although this method is currently ad hoc^ it seems that the properties we en- 
countered, and many others, could be written as simple recursive definitions and 
then translated by some provably correct algorithm to a Murf program that 
computes the same property. 

For example, starting with a simple recursive definition of accessibility. 



acc(O) A (Vx G [0, M) : acc[x) ^ [acc[left[x)) A acc[right[x)))^ 

it is simple to write a Murf program that sets the entries of an array acc[i] to 
true or false depending on whether node i is accessible. 

To prove property 1, we assume that the array color is initialized so that 

[color[0] = black) 

A{Wp G [0, M) : color[p] = black 

^ [color [left [p\] = black A color [right[p]] = black)) 

and then check the abstract state space with the predicate Vx : acc[x] ^ 
color [x] = black. 
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A similar approach was used to prove property 2. This property was slightly 
more complex, since the function needed to be computed twice: once on the 
original memory structure and once after the mutator has redirected an edge in 
the memory graph. 

As might be expected from its complexity, property 3 was somewhat more 
difficult to prove. We provided an auxiliary Murcj) program that, given a white 
accessible node, finds the witness to the existential quantifier in the consequent. 

We were able to prove this algorithm correct in about seven days. The ma- 
chine time required to prove the final version of the garbage collection algorithm 
is about three hours. Finding appropriate abstraction predicates took much of 
the time, and required an understanding of the algorithm. Typically we would 
start with some invariants which seemed should hold in the system as part of 
the abstract state. More often than not, the proof process would generate traces 
where the candidate invariant would fail. This mostly happened because of two 
reasons: 

— We left out some “obvious” axiom about acc. 

— The invariant does not hold under some situations and needed to be tweaked 
to get it right. This either needed changing the predicate or adding other 
predicates. 

During the proof process we also discovered some bugs which were accidentally 
added while coding the algorithm. Of course, much of the human time was spent 
figuring out what the axioms should be and how to prove them. 

5 Conclusions 

Based on the experiences reported here, we believe that predicate abstraction 
can be a very cost-effective verification technique for non-finite problems such 
as parameterized systems. 

Predicate abstraction could be regarded as an infinite-state alternative to 
model checking. However, we believe it would be most valuable in as a method 
for checking or strengthening invariants in a larger verification effort involving 
other tools, especially interactive theorem provers. 

The Mur(j) verifier is a prototype for evaluating ideas, not a polished tool. 
To be generally useful, every aspect of the Mur(f> system needs additional 
work (including a name change). In particular, there is a need for better support 
for quantifiers, and more generally efficient and powerful decision procedures. 
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Abstract. Of special interest in formal verification are safety properties, which 
assert that the system always stays within some allowed region. A computation that 
violates a general linear property reaches a bad cycle, which witnesses the violation 
of the property. Accordingly, current methods and tools for model checking of 
linear properties are based on a search for bad cycles. A symbolic implementation 
of such a search involves the calculation of a nested fixed-point expression over 
the system’s state space, and is often very difficult. Every computation that violates 
a safety property has a finite prefix along which the property is violated. We use 
this fact in order to base model checking of safety properties on a search for finite 
bad prefixes. Such a search can be performed using a simple forward or backward 
symbolic reachability check. A naive methodology that is based on such a search 
involves a construction of an automaton (or a tableau) that is doubly exponential in 
the property. We present an analysis of safety properties that enables us to prevent 
the doubly-exponential blow up and to use the same automaton used for model 
checking of general properties, replacing the search for bad cycles by a search for 
bad prefixes. 



1 Introduction 

Today’s rapid development of complex and safety-critical systems requires reliable veri- 
fication methods. In formal verification, we verify that a system meets a desired property 
by checking that a mathematical model of the system meets a formal specification that 
describes the property. Of special interest are properties asserting that observed behavior 
of the system always stays within some allowed set of finite behaviors, in which nothing 
“bad” happens. For example, we may want to assert that every message received was 
previously sent. Such properties of systems are called safety properties. Intuitively, a pro- 
perty t/; is a safety property if every violation of f occurs after a finite execution of the 
system. In our example, if in a computation of the system a message is received without 
previously being sent, this occurs after some finite execution of the system. 

In order to define safety properties formally, we refer to computations of a nontermi- 
nating system as infinite words over an alphabet U. Typically, U = 2^^, where AP is the 
set of the system’s atomic propositions. Consider a language L of infinite words over U. 
A finite word x over 27 is a bad prefix for L iff for all infinite words y over 27, the conca- 
tenation X • y of X and y is not in L. Thus, a bad prefix for L is a finite word that cannot be 
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Intel Corporation. Part of this work was done when this author was a Varon Visiting Professor at 
the Weizmann Institute of Science. 
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extended to an infinite word in L. A language L is a safety language if every word not in L 
has a finite bad prefix. For example, L = {0^, 1^} C {0, 1}^ is a safety language: every 
word not in L contains 01 or 10, and a prefix that ends in one of these sequences cannot 
be extended to a word in L. The definition of safety we consider here is given in [AS 85], 
it coincides with the definition of limit closure defined in [Eme83], and is different from 
the definition in [Lam85], which also refers to the property being closed under stuttering. 

Linear properties of non terminating systems are often specified using Buchi automata 
on infinite words or linear temporal logic (LTL) formulas. We say that an automaton A 
is a safety automaton if it recognizes a safety language. Similarly, an LTL formula is 
a safety formula if the set of computations that satisfy it form a safety language. Sistla 
shows that the problem of determining whether a nondeterministic B uchi automaton or an 
LTL formula are safety is PSPACL-complete [Sis94] (see also [AS87]). From the results 
in [KV97], it follows that the problem is in PSPACE even when the Buchi automaton is 
alternating. On the other hand, when the Buchi automaton is deterministic, the problem can 
be solved in linear time [MP92]. Sistla also describes sufficient syntactic requirements 
for safe LTL formulas. For example, a formula (in positive normal form) whose only 
temporal operators are G (always) and X (next), is a safety formula [Sis94]. Suppose that 
we want to verify the correctness of a system with respect to a safety property. Can we 
use the fact that the property is known to be a safety property in order to improve general 
verification methods? The positive answer to this question is the subject of this paper. 

Much previous work on verification of safety properties follow the proof-based ap- 
proach to verification [Fra92]. In the proof-based approach, the system is annotated with 
assertions and proof rules are used to verify the assertions. In particular. Manna and Pnu- 
eli consider verification of reactive systems with respect to safety properties in [MP92, 
MP95]. The definition of safety formulas considered in [MP92,MP95] is syntactic: a safety 
formula is a formula of the form Gp where c/p is a past formula. The syntactic definition is 
equivalent to the definition discussed here [MP92]. While proof-rules approaches are less 
sensitive to the size of the state space of the system, they require a heavy user support. 
Our work here considers the state -exploration approach to verification, where automatic 
model checking [CE81,QS81] is performed in order to verify the correctness of a system 
with respect to a specification. Previous work in this subject considers special cases of 
safety properties such as invariance checking [GW91,McM92,Val93,MR97], or assume 
that a general safety property is given by the set of its bad prefixes [GW91]. 

General methods for model checking of linear properties are based on a construction 
of a tableau or an automaton A^^^p that accepts exactly all the infinite computations that 
violate the property f [LP85,VW94]. Given a system M and a property verification of 
M with respect to f is reduced to checking the emptiness of the product of M and A-^p, 
[VW86]. This check can be performed on-the-fly and symbolically [CVWY92,GPVW95, 
TBK95]. When f is an LTL formula, the size of A-ip is exponential in the length of 

and the complexity of verification that follows is PSPACE, with a matching lower 
bound [SC85]. 

Consider a safety property Let prefff) denote the set of all bad prefixes for f. Recall 

that every computation that violates f has a prefix in pref{f).We say that an automaton 
on finite words is tight for a safety property f if it recognizes prefff). Since every system 
that violates f has a computation with a prefix in prefff), an automaton tight for f is 
practically more helpful than A^^^p . Indeed, reasoning about automata on finite words is 
easier than reasoning about automata on infinite words (cf. [HKSV97]). In particular. 
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when the words are finite, we can use backward or forward symbolic reachability analysis 
[BCM+92JN97]. In addition, using an automaton for bad prefixes, we can return to the 
user a finite error trace, which is a bad prefix, and which is often more helpful than an 
infinite error trace* 

Given a safety property t/;, we construct an automaton tight for t/;. We show that the 
construction involves an exponential blow-up in the case ^ is given as a nondeterministic 
Buchi automaton, and involves a doubly-exponential blow-up in the case t/; is given in LTL. 
These results are surprising, as they indicate that detection of bad prefixes with a nondeter- 
ministic automaton has the flavor of determinization. The tight automata we construct are 
indeed deterministic. Nevertheless, our construction avoids the difficult determinization 
of the Buchi automaton for (cf. [Saf88]) and just uses a subset construction. 

Our construction of tight automata reduces the problem of verification of safety pro- 
perties to the problem of invariance checking [Fra92,MP92] , Indeed, once we take the 
product of a tight automaton with the system, we only have to check that we never reach 
an accepting state of the tight automaton. Invariance checking is amenable to both mo- 
del checking techniques [BCM+92,IN97] and deductive verification techniques [BM83, 
SOR93,MAB+94]. In practice, the verified systems are often very large, and even clever 
symbolic methods cannot cope with the state-explosion problem that model checking 
faces. The way we construct tight automata also enables, in case the BDDs constructed 
during the symbolic reachability test get too large, an analysis of the intermediate data 
that has been collected. The analysis can lead to a conclusion that the system does not 
satisfy the property without further traversal of the system. 

In view of the discouraging blow-ups described above, we release the requirement on 
tight automata and seek, instead, an automaton that need not accept all the bad prefixes, 
yet must accept at least one bad prefix of every computation that does not satisfy t/;. 
We say that such an automaton is fine for For example, an automaton that recognizes 
[f • (-ip) • (p V -ip) does not accept all the words in pref{Gp), yet is fine for Gp. In 
practice, almost all the benefit that one obtain from a tight automaton can also be obtained 
from a fine automaton. We show that for natural safety formulas the construction of an 
automaton fine for is as easy as the construction of . In order to formalize the notion of 
“natural safety formulas”, we partition safety properties into intentionally, accidentally, 
and pathologically safe properties. While most safety properties are intentionally safe, 
accidentally safe and especially pathologically safe properties contain some redundancy, 
and we do not expect to see them often in practice. We show that the automaton A^^^p, 
which accepts exactly all infinite computations that violate fi, can easily (and with no 
blow-up) be modified to an automaton on finite words, which is tight for that is 
intentionally safe, and is fine for that is accidentally safe. We present a methodology for 
model checking of safety properties that is based on the above classification, uses 
instead of and thus replaces the search for bad cycles by a search for bad prefixes. 

2 Preliminaries 

2.1 Safety Languages and Formulas 

< Consider a language L C of infinite words over the alphabet A finite word 
X G A'* is a bad prefix for L iff for all y G we have x - y ^ L. A language L is a 
safety language iff every w ^ L has a finite bad prefix. For a safety language L, we denote 
by pref{L) the set of all bad prefixes for L. We say that a set A C pref{L) is a trap for a 
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safety language L iff every word w ^ L has at least one prefix in X. We denote all the 
traps for L by trap{L). 

For a language L C , we use comp{L) to denote the complement of L; i.e., 
comp{L) = \ L. We say that a language L C is a co-safety language iff comp{L) 

is a safety language. (The term used in [MP92] is guarantee language.) Equivalently, L 
is co-safety iff every w e L has a good prefix x e such that for all y e X"^, we have 
X • y e L. For a co-safety language L, we denote by co-pref{L) the set of good prefixes 
for L. Note that co-pref{L) = pref{comp{L)). 

For an LTL formula over a set AP of atomic propositions, let ||t/;|| denote the set of 
computations in ( 2^^ that satisfy We say that t/; is a safety formula iff 1 1 t/; 1 1 is a safety 

language. Also, t/; is a co-safety formula iff ||t/;|| is a co-safety language or, equivalently, 

1 1 -it/; 1 1 is a safety language. 

2.2 Word Automata 

Given an alphabet X\ an infinite word over X is an infinite sequence w = ai • a 2 ••• of 
letters in X. We denote by the suffix ar • cr ^+2 * * * of u?. An automaton on infinite 
words is ^ = {X, Q, 6, Qo,F), where X is the input alphabet, Q is a finite set of states, 
^ is a transition function, Qo C Q is a set of initial states, and F C Q is an acceptance 
condition. When ^is deterministic, the size of Qo is 1, and 5 : QxX ^ Q maps each state 
and letter to a single successor state. When ^is nondeterministic , 5 : Qx X ^ 2^ maps 
each state and letter to a possible set of successor states. Since the choice of a successor 
state is existential, we can regard a transition p{q,cr) = {^ 1 ,^ 2 , ^ 3 } as a disjunction 

V g 2 V gs. Transitions of alternating automata can be arbitrary positive formulas over 
Q. We can have, for instance, a transition 5{q, a) = (gi A ^ 2 ) V (qs A ^ 4 ), meaning that 
the automaton accepts from state q a suffix starting by a, of w, if it accepts from 
both qi and q 2 or from both qs and ^ 4 . Such a transition combines existential and universal 
choices. Runs of an alternating automaton are infinite trees, where branches corresponds 
to universal choices of the automaton. For example, if A is an automaton with an initial 
state go and S{qin,ao) = (gi V g 2 ) A (gs V g 4 ), then possible runs of ^ on u? have a root 
labeled g^^, have one node in level 1 labeled gi or g 2 , and have another node in level 1 
labeled gs or g 4 . When Al is a Buchi automaton on infinite words, a run is accepting iff it 
visits infinitely many states from F along each of its branches. The automaton A can also 
run on finite words in 27* . Then, a run over a word in X'^ is accepting if it visits states in 
F in it all its nodes of level n. A word (either finite or infinite) is accepted by A iff there 
exists an accepting run on it. The language of A, denoted C{A), is the set of words that 
A accepts. Deterministic and nondeterministic automata can be viewed as special cases 
of alternating automata. Formally, an alternating automaton is deterministic if for all g 
and a, we have S{q,a) e Q U {false}, and it is nondeterministic if S{q,a) is always a 
disjunction. For a detailed definition of alternating automata see [Var96]. 

We define the size of an alternating automaton A = {X,Q,6,Qo,F) as the sum of |Q | 
and \6\, where \6\is the sum of the lengths of the formulas in 6. We say that the automaton 
A over infinite words is a safety (co-safety) automaton iff C{A) is a safety (co-safety) 
language. We use pref{A), co-pref{A), trap{A), and comp{A) to abbreviate pref{£{A)), 
co-pref{C{A)), trap{C{A)),dind (i3(Al)), respectively. For an automaton Al and a set 
of states S, we denote by A^ the automaton obtained from A by defining the set of initial 
states to be S. We say that an automaton A over infinite words is universal iff £{A) = X ^ . 
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When A runs on finite words, it is universal iff C{A) = An automaton is empty iff 
^{A) = 0. A set S of states is universal (resp., rejecting), when A^ is universal (resp., 
empty). Note that the universality problem for nondeterministic automata is known to be 
PSPACE-complete [MS72,Wol82]. 

3 Detecting Bad Prefixes 

Linear properties of nonterminating systems are often specified using automata on infinite 
words or linear temporal logic (LTL) formulas. Given an LTL formula one can build 
a nondeterministic Biichi automaton A^p that recognizes ||t/;||. The size of A^^p is, in the 
worst case, exponential in t/; [GPVW95,VW94]. In practice, when given a property that 
happens to be safe, what we want is a nondeterministic automaton on finite words that 
detects bad prefixes. As we discuss in the introduction, such an automaton is easier to 
reason about. In this section we construct, from a given safety property, an automaton for 
its bad prefixes. 

We first study the case where the property is given by a nondeterministic Buchi auto- 
maton. When the given automaton A is deterministic, the construction of an automaton 
A^ for pref{A) is straightforward. Indeed, we can obtain A^ from A by defining the set of 
accepting states to be the set of states s for which A^ is empty. Theorem 1 below shows 
that when ^ is a nondeterministic automaton, things are not that simple. While we can 
avoid a difficult determinization of A [Saf88], we cannot avoid an exponential blow-up. 

Theorem 1. Given a safety nondeterministic Biichi automaton A of size n, the size of an 
automaton that recognizes pref{A) is 

Proof We start with the upper bound. Let A = (27, Q, S, Qo,P)- Recall that pref{L{A)) 
contains exactly all prefixes x g 27* such that for all y e 27^, we have x • y ^ C{A). 
Accordingly, the automaton for pref{A) accepts a prefix x iff the set of states that A 
could be in after reading x is rejecting. Formally, we define the (deterministic) automaton 
A' = (27, {Qo}, F'), where F' contains all the rejecting sets of A, and 6' follows 
the subset construction induced by 6; that is, for every S e 2^ and a G 27, we have 
6'{S,a) = \J^^g6{s,a). 

We now turn to the lower bound. Essentially, it follows from the fact that pref{A) refers 
to words that are not accepted by A, and hence, it has the flavor of complementation. Com- 
plementing a nondeterministic automaton on finite words involves an exponential blow-up 
[MF71]. In fact, one can construct a nondeterministic automaton A = (27, Q, 6, Qo,Q), 
in which all states are accepting, such that the smallest nondeterministic automaton that 
recognizes comp{A) has states. (To see this, consider the language consisting 

all all words w such that either |tc| < 2n or w = uvz, where |w| = |i;| = n and u A ^-) 
Given A as above, let A' be A when regarded as a Biichi automaton on infinite words. It 
is not hard to see that pref{A!) = comp{A). 

The lower bound in Theorem 1 is not surprising, as complementation of nondetermini- 
stic automata involves an exponential blow-up, and, as we demonstrate in the lower-bound 
proof, there is a tight relation between pref{A) and comp {A). We could hope, therefore, 
that when properties are specified in a negative form (that is, they describe the forbidden 
behaviors of the system) or are given in LTL, whose formulas can be negated, detection 
of bad prefixes would not be harder than detection of bad computations. In Theorems 2 
and 3 we refute this hope. 
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Theorem 2. Given a co-safety nondeterministic Buchi automaton A of size n, the size of 
an automaton that recognizes co-pref{C{A)) is 

Proof The upper bound is similar to the one in Theorem 1, only that now we define the 
set of accepting states in A^ as the set of all the universal sets of A. We prove a matching 
lower bound. For n > 1, let Lf = We define as the language of 

all words w e Uf such that w contains at least one & and the letter after the first & 
is either & or it has already appeared somewhere before the first The language Ln 
is a co-safety language. Indeed, each word in has a good prefix (e.g., the one that 
contains the first & and its successor). We can recognize Ln with a nondeterministic 
Buchi automaton with 0(n) states (the automaton guesses the letter that appears after the 
first &). Obvious good prefixes for Ln are 12&&, 123&2, etc. We can recognize these 
prefixes with a nondeterministic automaton with 0(n) states. But Ln also has some less 
obvious good prefixes, like 1234 • • • nSz (a permutation of 1 ... n followed by &). These 
prefixes are indeed good, as every suffix we concatenate to them would start in either 
& or a letter in {1, . . . , n} that has appeared before the To recognize these prefixes, 
a nondeterministic automaton needs to keep track of subsets of {1, . . . , n}, for which it 
needs 2^ states. Consequently, a nondeterministic automaton for co-pref{Ln) must have 
at least 2^ states. 

We now extend the proof of Theorem 2 to get a doubly-exponential lower bound for 
going from a safety LTL formula to a nondeterministic automaton for its bad prefixes. The 
idea is similar: while the proof in Theorem 2 uses the exponential lower bound for going 
from nondeterministic to deterministic Buchi automata, the proof for this case is a variant 
of the doubly exponential lower bound for going from LTL formulas to deterministic 
Buchi automata [KV98]. 

Theorem 3. Given a safety LTL formula, the size of a nondeterministic Buchi automaton 
for pref{f) is doubly exponential in the length off. 

In order to get the upper bound in Theorem 3, we apply the exponential construction in 
Theorem 1 to the exponential Buchi automaton A^p for 1 1 t/; 1 1 . The construction in Theorem 1 
is based on a subset construction for Ap, and it requires a check for the universality of 
sets of states Q of Ap . Such a check corresponds to a validity check for a DNF formula 
in which each disjunct corresponds to a state in Q. While the size of the formula can be 
exponential in \ f\, the number of distinct literals in the formula is at most linear in \ f\, 
implying that the the universality of Q can be checked using space polynomial in \ f\. 

Given a safety formula f, we say that a nondeterministic automaton A over finite 
words is tight for f iff C{A) = /7rc/(||7/;||). In view of the lower bounds proven above, 
a construction of tight automata may be too expensive. We say that a nondeterministic 
automaton A over finite words h fine for f iff there exists X G frap(||7/;||) such that 
C{A) = X. Thus, a fine automaton need not accept all the bad prefixes, yet it must accept 
at least one bad prefix of every computation that does not satisfy f. In practice, almost 
all the benefit that one obtain from a tight automaton can also be obtained from a fine 
automaton (we will get back to this point in Section 6). It is an open question whether 
there are feasible constructions of fine automata for general safety formulas. In Section 5 
we show that for natural safety formulas f, the construction of an automaton fine for f 
is as easy as the construction of an automaton for f. 
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4 Symbolic Verification of Safety Properties 

Our construction of tight automata reduces the problem of verification of safety properties 
to the problem of invariance checking, which is amenable to a large variety of techniques. 
In particular, backward and forward symbolic reachability analysis have proven to be 
effective techniques for checking invariant properties on systems with large state spaces 
[BCM+92,IN97]. In practice, however, the verified systems are often very large, and 
even clever symbolic methods cannot cope with the state-explosion problem that model 
checking faces. In this section we describe how the the way we construct tight automata 
enables, in case the BDDs constructed during the symbolic reachability test get too big, an 
analysis of the intermediate data that has been collected. The analysis solves the model- 
checking problem without further traversal of the system. 

Consider a system M = {AP, W, R, Wq , T) , where W is the set of states, R C WxW 
is a transition relation, Wq is a set of initial states, and L : W ^ 2^^ maps each state to 
the sets of atomic propositions that hold in it. Let fln{M) be an automaton that accepts all 
finite computations of M. Given t/;, let be the nondeterministic co-safety automaton 

for -it/;, thus = ||~'^||. In the proof of Theorem 2, we construct an automaton A^ 

such that C{A^) = pref{ij) by following the subset construction of A^^ and defining the 
set of accepting states to be the set of universal sets in A^^. Then, one needs to verify 
the invariance that the product ^n(M) x A' never reaches an accepting state of A'. In 
addition to forward and backward symbolic reachability analysis, one could use a variety 
of recent techniques for doing semi-exhaustive reachability analysis [RS95,YSAA97], 
including standard simulation techniques [LWA98]. Note, however, that if A' is doubly 
exponential in |t/;|, the BDD representation of A^ will use exponentially (in |t/;|) many 
Boolean variables. 

Another approach is to apply forward reachability analysis to the product M x A^^ 
of the system M and the automaton A^^. Formally, let A^-ip = {2^^ 6, Qo,F), and 

let M be as above. The product M x A^^ has state space W x Q, and the successors 
of a state {w, q) are all pairs (u?^, q^) such that R{w,n/) and q^ e S{q, L{w)). Forward 
symbolic methods use the predicate post{S), which, given a set of S of states (represented 
symbolically) returns the successor set of S, that is, the set of all states t such that there 
is a transition from some state in S to t. Starting from the initial set Sq = Wq x Qo, 
forward symbolic methods iteratively construct, for i > 0, the set = post{Si). The 
calculation the Si's proceeds symbolically, and they are represented by BDDs. Doing so, 
forward symbolic methods actually follow the subset construction of M x A^^. Indeed, 
for each w e W the set q) ^ ^i} is the set of states that A^^^p that can be 

reached via a path of length i in M from a state in Wq to the state w. Note that this set 
can be exponentially (in |t/;|) large resulting possibly in a large BDD; on the other hand, 
the number of Boolean variables used to represent A^p^ is linear in |t/;|. 

The discussion above suggests the following technique for the case we encounter 
space problems. Suppose that at some point the BDD for Si gets too big. We then check 
whether there is a state w such that the set Qf is universal. As discussed in Section 3, we 
can check the universality of Qf in space polynomial in |t/;|. Note that we do not need to 
enumerate all states w and then check Qf. We can enumerate directly the sets Qf, whose 
number is at most doubly exponential in |t/;|. It can be shown that M x A^p, is nonempty 
iff Qf is universal for some w and i > 0, thus this check solves the model-checking 
problem without further traversal of the system. 
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Note that it is possible to use semi-exhaustive reachability techniques also when 
analyzing M x That is, instead of taking Si^i to be post{Si) we can take it to 
be a subset of post{Si) [RS95,YSAA97]. We have to ensure, however, that is 
saturated with respect to states of A^-ip [LWA98] . Informally, we are allowed to drop states 
of M from Si^ i , but we are not allowed to drop states of . Formally, if {w,q) e S'Xi 

and {w,q^) e 5^^i,then {w,q^) e 5- This ensures that if the semi-exhaustive analysis 
follows a bad prefix of length i in M , then Q = {q ^ ^ will be universal. In 

the extreme case, we follow only one trace of M, i.e., we simulate M. In that case, we 
have that 5-^^ = {w} x . For a related approach see [CES97]. 

5 Classification of Safety Properties 

Consider the safety LTL formula Gp. A bad prefix x for Gp must contain a state in which 
p does not hold. If the user gets x as an error trace, he can immediately understand why 
Gp is violated. Consider now the LTL formula t/; = G{p W [Xq A A-ig)). The formula 

is equivalent to Gp and is therefore a safety formula. Moreover, the set of bad prefixes 
for tp and Gp coincide. Nevertheless, a minimal bad prefix for tp (e.g., a single state in 
which p does not hold) does not tell the whole story about the violation of tp. Indeed, 
the latter depends on the fact that Xq A X^qis unsatisfiable, which (especially in more 
complicated examples), may not be trivially noticed by the user. This intuition, of a prefix 
that “tells the whole story”, is the base for a classification of safety properties into three 
distinct safety levels. We first formalize this intuition in terms of informative prefixes. We 
assume that LTL formulas are given in positive normal form, where negation is applied 
only to propositions (when we write we refer to its positive normal form). In the 
positive normal form, we use the operator V as dual to the operator U , and use d{fi) to 
denote the closure of namely, the set of t/;’s subformulas. 

For an LTL formula and a finite computation tt = ai • a 2 • • • with G 2^^, 
we say that tv is informative for iff there exists a mapping L : n+ 1} ^ 

such that the following hold: (1) G L(l). (2) L(n + 1) is empty. (3) For all 1 < i < n 
and (f e L(i), the following hold. 

- If (/? is a propositional assertion, it is satisfied by a^. 

- If (f = Pi V p 2 then Pi G L{i) or p 2 G T(i). 

- If p = Pi A p 2 then pi G L{i) and p 2 G L{i). 

- If p = Xpi, then pi G L{i + 1). 

- If p = piUp 2 , then p 2 G L{i) or [pi G L{i) and piU p 2 G L{i + 1)]. 

-lfp = piV p 2 , then p 2 G L{i) and [pi G L{i) or piV p 2 G L{i + 1)]. 

Note that the emptiness of L(n + 1) guarantees that all the requirements imposed by 
are fulfilled along tv. For example, while the finite computation {p} • 0 is informative 
for Gp (with L(l) = {L'-'p},L(2) = {F->p, -ip}, and L(3) = 0), it is not informative 
for = G{p V {Xq A X^q)). Indeed, as = F{^p A {X^qV Xq)), an informative 
prefix for must contain at least one state after the first state in which -ip holds. 

We distinguish between three types of safety formulas. 

- A safety formula is intentionally safe iff all the bad prefixes for are informative. 
For example, the formula Gp is intentionally safe. 
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- A safety formula t/; is accidentally safe iff not all the bad prefixes for f are informative, 
but every computation that violates f has an informative bad prefix. For example, the 
formulas G[q V XGp) A G{r V XG^p) and G{p W [Xq A X^q)) are accidentally 
safe. 

- A safety formula f is pathologically safe if there is a computation that violates f 
and has no informative bad prefix. For example, the formula [G[q\/ GFp) A G {r \/ 
GF^jj)] V GqV Gr is pathologically safe. 

Sistla has shown that all temporal formulas in positive normal form constructed with 
the temporal connectives X and V are safety formulas [Sis94]. We call such formulas 
syntactically safe. The following strengthens Sistla’s result. 

Theorem 4. If f is syntactically safe, then f is intentionally or accidentally safe. 

Given an LTL formula f in positive normal form, one can build an alternating Buchi 
automaton = {2^^ ,Q,6,Qo, F) such that C{A^p) = ||t/;||. Essentially, each state 
of corresponds to a subformula of and its transitions follow the semantics of 

LTL [Var96]. We define the alternating Buchi automaton = {2^^ S, Qo, 0 ) by 
redefining the set of accepting states to be the empty set. So, while in A-p a copy of the 
automaton may accept by either reaching a state from which it proceed to true or visiting 
states of the form piV(f 2 infinitely often, in all copies must reach a state from 

which they proceed to true. Accordingly, accepts exactly these computations that 

have a finite prefix that is informative for f. To see this, note that such computations 
can be accepted by a run of Ap in which all the copies eventually reach a state that is 
associated with propositional assertions that are satisfied. Now, lQifln{A^p^^) be 
when regarded as an automaton on finite words. 

Theorem 5. For every safety formula f, the automaton fin{A^f^^) accepts exactly all the 
prefixes that are informative for f. 

Corollary 1. Consider a safety formula fi. If f is intentionally safe, then fin{A^f^^) is 
tight for fi. Also, iff is accidentally safe, then fin{A!f^'^fi is fine for fi. 

Theorem 6. Deciding whether a given formula is pathologically safe is P SPACE- complete. 



Proof. Consider a formula fi. Recall that the automaton accepts exactly these com- 
putations that have a finite prefix that is informative for fi. Hence, is not pathologically 
safe iff every computation that does not satisfy is accepted by AA^p^. Accordingly, 
checking whether is pathologically safe can be reduced to checking the containment of 
C{A^p ) in Since the size of Ap is linear in the length of f and containment for 

alternating Biichi automata can be checked in polynomial space [KV97], we are done. For 
the lower bound, we do a reduction from the problem of deciding whether a given formula 
is a safety formula. Consider a formula and let p, q, and r be atomic propositions not 
in f. The formula p = [G{q V GFp) A G{r V GF->p)] V Gq V Gr is pathologically safe. 
It can be shown that t/; is a safety formula iff f Ap is pathologically safe. 



Note that the lower bound in Theorem 6 implies that the reverse direction of Theorem 4 
does not hold. 
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6 A Methodology 

In Section 5, we partitioned safety formulas into three safety levels and showed that for 
some formulas, we can circumvent the blow-up involved in constructing a tight automaton 
for the bad prefixes. In particular, we showed that the automaton which is 

linear in the length of t/;, is tight for ^ that is intentionally safe and is fine for ^ that is 
accidentally safe. In this section we describe a methodology for efficient verification of 
safety properties that is based on these observations. Consider a system M and a safety 
LTL formula t/;. Let^n(M ) be a nondeterministic automaton on finite words that accepts 
the prefixes of computations of M, and let be the nondeterministic automaton 

on finite words equivalent to the alternating automaton [CKS81]. The size of 
exponential in the size of^n(^^™^), hence it is exponential in the length of 'ip. 
Given M and t/;, we suggest to proceed as follows (see the figure below). 




Instead of checking the emptiness of M x verification starts by checking^n(M) 
with respect to Since both automata refer to finite words, this can be done using 

finite forward reachability analysis. If the product^n(M ) x is not empty, we return 
a word w in the intersection, namely, a bad prefix for that is generated by If the 
product ) x is empty, then, as ^ is fine for intentionally and accidentally 
safe formulas, there may be two reasons for this. One, is that M satisfies and the second 
is that is pathologically safe. Therefore, we next check whether is pathologically safe. 
(Note that for syntactically safe formulas this check is unnecessary, by Theorem 4.) If it is 
not pathologically safe, we conclude that M satisfies pj. Otherwise, we tell the user that his 
formula is pathologically safe, indicating that his specification is needlessly complicated 
(accidentally and pathologically safe formulas contain redundancy). At this point, the user 
would probably be surprised that his formula was a safety formula (if he had known it is 
safety, he would have simplified it to an intentionally safe formula - a feasible automatic 
simplification of such formulas is an open problem). If the user wishes to continue with 
this formula, we give up using the fact that tp is safety and proceed with usual LTL model 
checking, thus we check the emptiness of M x (Recall that the symbolic algorithm 
for emptiness of Buchi automata is in the worst case quadratic [HKSV97,TBK95].) Note 
that at this point, the error trace that the user gets if M does not satisfy tp consists of a 
prefix and a cycle, yet since the user does not want to change his formula, he probably 
has no idea why it is a safety formula and a finite non-informative error trace would not 

^ Note that since ip may not be intentionally safe, the automaton may not be tight for tp, 

thus while re is a minimal informative bad prefix, it may not be a minimal bad prefix. 
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help him). If the user prefers, or if M is very large (making the discovery of bad cycles 
infeasible), we can build an automaton for pref{^il)), hoping that by learning it, the user 
would understand how to simplify his formula or that, in spite of the potential blow-up in 
7/;, finite forward reachability would work better. 

Acknowledgement. The second author is grateful to Avner Landver for stimulating dis- 
cussions. 
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Abstract. In this paper we show how to use McMillan’s complete finite 
prefix approach for process algebra. We present the model of component 
event structures as a semantics for process algebra, and show how to 
construct a complete finite prefix for this model. We present a simple 
adequate order (using an order on process algebra expressions) as an 
optimization to McMillan’s original algorithm. 



1 Introduction 

A major problem in the verification of distributed systems is the state explosion 
problem. This problem results when the modelling a system consisting of parallel 
subsystems causes the model to have a number of states that is of the same order 
of magnitude as the product of the states of the subsystems. 

In process algebra (e.g. [Hoa85,BB87,BW90] state explosion may occur when 
using the standard interleaving semantics. In order to deal with this problem 
one line of research has been to look for alternative semantic models based on 
partial orders, of which event structures [Win89,BC94,Lan92] are a prominent 
example. Event structures can be used as a semantics for process algebra and are 
easily extended with timing, probabilistic and stochastic information [BKLL98, 
KLL^98]. A problem though with event structures is that recursion leads to 
infinite structures, whereas for techniques like model checking it is important to 
have finite representations of infinite behaviour. 

An interesting direction of research has been initiated by McMillan, originally for 
finite state Petri nets [McM92,McM95a,McM95b]. He has presented an algorithm 
for constructing an initial part of the occurrence net [NPW81,Eng91] of a Petri 
net which contains all information on reachable states and transitions. This so- 
called complete finite prefix can be used as the basis for model checking [Esp94, 
Gra97,Wal98]. 

In this paper we explore how this McMillan complete finite prefix approach 
can be used in giving an event structure semantics to process algebra. Using a 
translation of process algebra into Petri nets (as has been done in [01d91]) would 
pose severe complications when calculating a prefix. The translation there makes 
use of a trick for dealing with the choice operator; this has as a side effect that 
not all reachable markings correspond in a clear way to reachable process algebra 
expressions. This would greatly complicate the computation of a finite prefix. 
Therefore we directly translate a process algebra expression into a model similar 
to an occurrence net in which choice can be modelled in a natural way. 
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The paper is organized as follows. In section 2 we present a process algebra 
and a model called component event structures which is to process algebra what 
occurrence nets are to Petri nets. In section 3 we use this model as a semantics 
for process algebra and in section 4 we show how to construct a complete finite 
prefix for this model. In section 5 we present an optimization to the McMillan 
algorithm which has the same advantages as a proposal in [ERV97] but profits 
from the process algebra setting. Section 6 is for conclusions. 

An extended version of this paper (containing all proofs) can be found in [LB99]. 

2 Process Algebra and Component Event Structures 

This paper uses a simple process algebra with a parallel operator similar to the 
one from CSP [Hoa85] or LOTOS [BB87]. The syntax is given by the following 
grammar: 

B ::= stop + 

The inaction process stop cannot do anything. Action prefix is denoted by a; 
where a G Act^ with Act a set of actions (a distinction between observable and 
invisible actions plays no role in this paper). The choice between Bi and B 2 is 
denoted by Si T ^ 2 - Parallel composition is denoted by B\ \a B 2 where A is 
the set of synchronizing actions; Bi B 2 is abbreviated to S 1 IS 2 . Finally, F 
denotes process instantiation where a behaviour expression is assumed to be in 
the context of a set of process definitions of the form P := B with B possibly 
containing process instantiations of P. 

A process algebra expression can be decomposed into so-called components, 
which are action prefix expressions together with information about the synchro- 
nization context. This approach has been inspired by the Petri net semantics for 
process algebra presented in [01d91]. 

Definition 1. A component s is defined by 

S ::= stop I a;i^ I | \aS with B a process algebra expression; the uni- 

verse of all components is denoted by Comp, 

Convention: let S = {Ai, . . . , be a set of components, then we use the 
notation <S|^= {Ai |a, . . . , |a}, and similarly for |a<^- 

Components can be in a choice relation which will be used to model the effect 
of the process algebra choice operator. 

Definition 2. A state is a tuple (<S, 77) with S a set of components, and IZ an 
irreflexive and symmetric relation between components (so IZ C S x S) called 
the choice relation. 

Convention: let IZ = {(S^i, A[), . . . , (A^, 5"^)} be a choice relation, then we use 
the notation 77 1 A= {(^^i I A, P[\a): •••, {Pn\A: | a)} and similarly for | a 77. 

Components (and the choice relation between them) can be obtained by decom- 
posing a process algebra expression. 
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Definition 3. The decomposition function dec, which maps a process algebra 
expression on a state, is recursively defined by dec{B) = (<S(i^), 1Z[B)) with 
dec(stop) = ({stop}, 0) 

dec{ao;] B) = B}, 0) 

dec{B \a B') = {S{B) \a U \aS{B'), Tl{B ) \a U |A7^(5')) 
dec{B + B') = {S{B) U S{B'), n{B) U n{B') U {S{B) x S{B'))) 
dec{F^) = dec{4>{B)) li F := B 

In order to avoid that the decomposition of a process instantiation leads to an 
infinite chain of substitutions, we have to adopt the constraint that all process 
definitions are guarded (see e.g. [BW90]). 

We define an event structure model which is very similar to a type of Petri nets 
called occurrence nets [NPW81,Eng91]; the main difference is that there are no 
tokens, and conditions can be in a binary choice relation. 

Definition 4. A condition event structure is a 4-tuple (D, fj, -<) with: 
-Da set of conditions 
-Da set of events 

"ttcDxD,the choice relation (symmetric and irreflexive) 

- C (D X D) U (D X D) the ^ relation 

We adopt some Petri net terminology: a marking is a set of conditions. A node is 
either a condition or an event. The preset of a node x, denoted by is defined 
hj *x = {y E D U E \ y x|, and the postset x* hj x* = {y E D U E \ x y}. 
The initial marking Mq is defined by Mq = {d E D | = 0}. 

Definition 5. The transitive and reflexive closure of -< is denoted by <. 

The confbt relation on nodes, denoted by is defined by: let xi and X 2 be 
two different nodes, then xi ^ X 2 iff there are two nodes yi and ^2, such that 
yi < Xi and y 2 < ^2? with 

- either yi and y 2 are two conditions in the choice relation, i.e. yi tty2 

- or yi and y 2 are two events with *yi Pi *y 2 ^ 0 



Definition 6. A condition event structure is well- formed if the following pro- 
perties hold: 

1. < is anti-symmetric, i.e. x < x' A x' < x ^ x = x' 

2. finite precedence, i.e. for each node x the set {y E E U D \ y < x} is finite 

3. no self-conflict, i.e. for each node x: -i(x ff x) 

4. for each event e\ *e ^ $ and e* 7^ 0 

5. for each condition d\ \*d\ < 1 

6. for all conditions di and ^2: ^ •di = *d .2 

Let d be a condition, then we define fhe set of conditions in choice with d, 
by tt(<^) = {d' I dfl d'}. Similarly for a set of conditions D, tt(D) = {d' \ 3d E V : 
d4d/}. 
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Definition 7. Suppose we have a condition event structure, with e an event, 
and M and markings, then we say there is an event transition M-^ iff 
*e C M and = [M Ue*) \ (•eUtt(*e)) (note there are no loops in well-formed 
condition event structures). 

An event sequence is a sequence of events ei . . . such that there are markings 
Ml, . . . , with Ml — ^ y\/e ^all C = {ci, . . . , e^} a 

configuration of the condition event structure. 

Definition 8. Two nodes x and are said to be independent, notation x x 
iff -i(x < x^) A -i(x^ < x) A -i(x ^ x^). 

Definition 9. A cut is a marking M such that for each pair of different con- 
ditions d and in M holds: d x d^ or d^^d\ and that is maximal (w.r.t. set 
inclusion) . 

Theorem 1. Let C he a configuration and M a cut. Define 
Cut{C) = (Mo U C*) \ {*C U i{*C)) and Conf{M) = {e e E \3d e M : e < d} . 
Then: Cut[C) is a cut^ Conf[M) is a configuration^ Conf[Cut{C)) = and 
Cut{Conf{M)) = M. 

Definition 10. A condition event structure S = ft, a) with mappings 

Ic '• D ^ Comp (mapping conditions to components) 

Ie E ^ Act (mapping events to actions) 
is called a component event structure. 

We will often be sloppy and denote a condition by its component label (but note 
that different conditions may be labelled with the same component). 

3 Component Event Structures as Semantics for Process 
Algebra 

In this section we define a component event structure as a semantics for a process 
algebra expression with the help of a derivation system for transitions (again 
inspired by [OldOl]). This derivation system will allow derivations of transitions 
of the form S-^ (5^, 7^^), where S and are sets of components, and W is a 
choice relation over components Sh The rules are given in table 1. 

Definition 11. Let ^ be a component event structure. The possible extensions 
of denoted by EE[E)^ is the set of all pairs (27, S-^ (5^^ 7^^)) such that: 

- 77 is a set of pairwise independent conditions of with Id{D) = S 

- S-^ ^ IZ') can be derived from the rules in table 1 

- E does not already contain an event e with Ie{^) = and *e = V 

For component event structures it is easy to check that if two conditions have 
the same component label they are in conflict; this means that a set of pair- 
wise independent conditions is labelled by a set of components with the same 
cardinality. 
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Table 1. Derivation system for component transitions 




We can add a possible extension (V^ 7^^)) G PE{£) to £ by adding 

a new event e labelled a and new conditions T>^ with labels from such that 
*e = V and e* = and a choice relation over the conditions T)^ induced by 
the relation IZ^ over <Sh 

Algorithm 1. Let i7 be a process algebra expression, with dec[B) = (<So,7^o)- 
The unfolding of 5, denoted Unf[B)^ is generated by the following algorithm: 

Let S be the component event structure with conditions Mo, 

= <Sq, choice relation iio events; 

pe := BE{Ef 

while pe ^ $ 

do select a pair (P, B')) from pe] 

add it to S] 
pe := BE{S) 

od; 

Unf{B) = E □ 

The algorithm only terminates for expressions with finite behaviour. For ex- 
pressions with infinite behaviour, the above algorithm produces arbitrarily large 
unfolding approximations (under the fairness assumption that each pair in pe 
is eventually added to S). In that case we define Unf[B) as the limit of these 
approximations. It is easy to prove that Unf[B) is a well-formed component 
event structure, i.e. the properties of definition 6 hold. 

Notation: let 7^ C <S x <S, and C N, then 1Z\S^ = 7Zn x S^). Note that if 
is the choice relation of Unf(B), and dec(B) = (M, 7^), then by the definition 
of unfolding 7^= tt[M. 

In [Lan92] it is shown how by slightly adapting the standard operational seman- 
tics it is possible to derive event sequences. In [LB99] this idea has been adapted 
to component event structures and the following result has been proven there. 

Theorem 2. Let B he a process algebra expression^ Unf[B) its unfolding and 
Mo the initial marking of Unf[B). Let a he an event trace. Then: 

BM^B^ ^ Mo^M^ vnthdec{B^) = 
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In the last section we saw that there is a one-to-one correspondence between 
cuts and configurations via the mappings Cut and Conf. In [Lan92] it has been 
proven (Theorem 7.4.1) that there is also a one-to-one correspondence between 
configurations and reachable states (where each reachable state is a process al- 
gebra expression). It follows that there is a one-to-one correspondence between 
cuts and states of some unfolding Unf[B)] therefore given a cut M' ^ there is a 
process algebra expression such that dec[B^) = (M^, |~M ^). So given an un- 
folding Unf[B)^ we define a mapping St from cuts to process algebra expressions 
by St[M^) = B^ where B^ is the reachable state corresponding to Conf[M^)^ so 
dec{B^) = {M\ ttr^i^). If C is a configuration, we will also write St{C) for 
St{Cut{C)), 

4 A Complete Finite Prefix for Component Event Structures 

In the last section we have defined the component event structure Unf[B) for a 
process algebra expression B. This representation may be infinite for recursive 
processes; we would like to have a finite representation of such behaviour. 
Therefore in this section we will look at McMillan’s so-called complete finite 
prefix of an unfolding, which is an initial part of the unfolding that is complete 
in the following sense: 

For each cut M of Unf[B) there is a cut of the finite prefix such that: 

— St[M) = At(M^), so the prefix contains all reachable states 

— if in Unf[B) with Ie{^) = then and so the prefix 

contains all transitions 

The complete finite prefix and McMillan’s algorithm for computing it have ori- 
ginally been defined in the context of Petri nets (see [McM95b,Esp94,ERV97]). 
However, the approach (using the concepts of event, configuration and cut) can 
be transferred completely to the setting of component event structures, as we 
show here. For details and proofs we refer to [McM95b,Esp94,ERV97]. 

The complete finite prefix approach only works for finite state processes, i.e. 
processes with a finite number of reachable states. It is in general undecidable 
whether a process algebra expression is finite state. However, there exist syntac- 
tical restrictions that are sufficient to guarantee that an expression is finite state 
(see [FGM92] for discussion and overview). In the following we simply assume 
that all process algebra expressions are finite state. 

We first need some preliminary definitions where we closely follow [ERV97]. 

Let E be a set of events and let C be a configuration of a component event 
structure. If C U F is a configuration, and C Pi F = 0, then we denote C U E hj 
C 0 F, the extension of C by F. 

Let M be a marking of a (well-formed) component event structure. Define the 
successor nodes of M hj N = {x E E U D \ 3y E M : y < We define 
M = {D n E n A |~W). It is easy to check that ff' M is a 

well-formed component event structure. 




190 R. Langerak and E. Brinksma 



It is easy to check that for a configuration C the unfolding Unf[St{C)) is 
isomorphic to ff' Cut{C). So if Ci and C 2 are two configurations such that 
St{Ci) = St{C 2 )^ then ff' Cut[Ci) and ff' Cut[C 2 ) are isomorphic. So there 
is an isomorphism from ff' Cut{Ci) to ff' Cut{C 2 )] this induces a mapping 
from the extensions of Ci onto the extensions of C 2 , so Ci 0 is mapped onto 

The following definition presents an important technical aspect of the calculation 
of a complete finite prefix. 

Definition 12. A (strict) partial order □ on the finite configurations of an 
unfolding is an adequate order iff: 

1. □ is well-founded, i.e. there is no infinite sequence Ci □ C 2 E . . . 

2. □ refines C, i.e. Ci C C 2 implies C\ \Z C 2 

3. □ is preserved by finite extensions, which means that if Ci \Z C2 and 
St{Ci) = St{C 2 ), then Ci®Er C2®l2{{E). 

The original algorithm by McMillan uses as adequate order the order \Zm defined 
by Cl \Zm C 2 IC'il < IC 2 I. This order is intuitively easy to understand but 
can be very inefficient. An improvement has been given in [ERV97]; in the next 
section we present an adequate order that is very suitable for a process algebra 
prefix. 

Let e be an event of a component event structure, then the local configuration 
[e] is defined by [e] = {e^ G E\e^ < e} (it is very easy to prove that [e] is indeed 
a configuration). 

Definition 13. Let Unf[B) be an unfolding and let □ be the selected adequate 
partial order on the configurations of Unf[B). An event e is a cut-o event if 
Unf[B) has a local configuration [e^] such that S't([e]) = St[[e']) and [e^] □ [e] 

Definition 14. Let X be the set of nodes of Unf[B) such that x G X iff no 
event causally preceding x is a cut-off event. Then the finite prefix Fp{B) of 
t/n/(i^) = (D, tt, -<) is defined by Lp(i^) = (T> n A, EoX^ A |~A) 

So Fp[B) contains all local configurations, and stops at cut-off events since their 
local configuration has been encountered already. The nice result (originally 
proven by McMillan for Petri nets [McM95b]) is that this is enough to guarantee 
completeness, so the prefix contains also all non-local configurations; Fp[B) is 
finite and complete. 

Conceptually a finite prefix is obtained by taking an unfolding and cutting away 
all successor nodes of cut-off events. This is not a practical recipe; the next 
algorithm shows how to obtain directly the complete finite prefix, without first 
creating the (possibly infinite) unfolding. First we redefine the set of possible 
extensions, to make sure that no successors of cut-off events are created. 

Definition 15. Let ^ be a labelled component event structure with a set of cut- 
off events cut The possible non-cut-o extensions of denoted by cut)^ 

is the set of all pairs (P, <S-% [S' ^ 77^)) such that (P, <S-% (<S^, dZ')) G PE{S) 
and \/d E V : *d ^ cut 
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Algorithm 2. Let 5 be a process algebra expression, with dec[B) = (<So,7^o)- 
Then the finite prefix Fp{B)^ is generated by the following algorithm: 

Let S be the component event structure with components Mo, 

= <Sq, choice relation and no events; 
cut := 0; 

pe := PE\E^ cut)] 
while pe ^ $ 

do select a pair (P, S — ^ (<S^, 7^^)) from pe such that adding it 
leads to a new e with [e] minimal w.r.t. □; 
add it to S; 

if e is a cut-of event then cut := cut U {e}; 
pe := FE\E^ cut) 

od; 

Fp{B) = E □ 

It is easy to check that Fp[B) as generated by algorithm 2 contains all nodes 
of Unf[B) that are not causally preceded by a cut-off event, so it is indeed the 
finite prefix defined by definition 14. 

Example F Consider B = B \i^ Q with F = a]b] F and Q = c]b] (e; F F d]Q), 
Then the unfolding is given in figure 1; cut-off events are indicated by a box. 




Fig. 1. Example of a complete finite prefix 
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5 An Adequate Order for Process Algebra 

As already noted in [ERV97], the original McMillan ordering \Zm defined by 
Cl C 2 iff \Ci\ < IC 2 I can be quite inefficient. Consider e.g. the expression 
a] F -\-b] F. Now although both a and b lead to the same state, it is not possible 
to make one of them a cut-off event as [a] and [b] have the same number of 
events. This makes it possible to find examples in which the finite prefix has a 
size that is exponential in the number of reachable states of the process algebra 
expression. 

In [ERV97] an adequate order has been defined that does not suffer from this 
problem. This order is total on all configurations, so whenever two local con- 
figurations have the same state this leads to a cut-off. This is an important 
improvement on the original McMillan order, but an adequate order does not 
need to be total on all configurations, in order to have this property. The order 
in [ERV97] is rather complicated as it requires operations on configurations like 
subtracting the set of minimal events of a configuration (in fact this order is 
defined on suffixes of configurations). 

In this section we define an adequate order which differs from \Zm only for con- 
figurations having the same state and the same number of events. This order is 
easy to implement as it is defined syntactically as a kind of lexicographical order 
on process algebra expressions. The order orders each pair of configurations with 
the same state, so local configurations having the same state always lead to a 
cut-off. 

We assume that initially process instantiations in an expressions are indexed 
with simple process indices (denoted by Greek letters) and actions are indexed 
by action indices. We furthermore assume an operation ^{B) that takes an in- 
dexed expression B and prefixes all indices with <P. We change the operational 
rule for process instantiation into {d^{B) — > B^ ^ F := B) ^ F<^ — > B^). This me- 
ans that the process index of an instantiation F<p has the effect of prefixing all 
indices in the defining expression of F with leading to process indices that 
are strings of simple process indices; for details we refer to [LB99]. 

We assume there is an order on simple process indices; this order is arbitrary but 
we have (for technical reasons) the constraint that the order should respect the 
left to right order of the indices in the indexed process algebra expression that 
we are interested in (so if a is ordered before /?, it will occur as a process index 
to the left of the occurrence of /?). Remember that a proces index is a string of 
simple process indices; so the order on simple process indices induces a lexico- 
graphical order on process indices. This lexicographical order is not well-founded 
but can be used to define a well-founded order on process indices: 

Definition 16. Let and <p 2 be two process indices. We define <C ^2 iff 
either |^i| < |^ 2 |, or |^i| = \<p 2 \ and is lexicographically smaller than <p 2 - 

With this order we can define an order on process algebra expressions that are 
equal modulo process identifiers; B\ and B 2 are equal modulo process indentifier, 
notation Bi =p iff after removing all process indices they are equal. 
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A component is of the form stop^ or with ^ a process index and i a 

simple action index, possibly decorated with strings of parallel operators to the 
left and right. In these cases we call <P the process index of the component. 

Definition 17 . Let Bi and B2 be two different process algebra expressions with 
B\ =p i^ 2 - Then to each component of Bi corresponds a component of B2 that 
is equal modulo process identifiers. We define Bi <C ^2 iff for the leftmost first 
two corresponding components of respectively B 2 that have different process 
indices respectively <p 2 holds: <C ^ 2 - 

Exarnple 2, Suppose a comes before f3 in the order on simple process indices, 
then Bq,(P |q ^ct 2 ) Qatp Bq,(P |q ^/32i Q(3^<p Q-rid 

stop^ |g cii]B^ < stop^ |g ai]B^ 

With the help of <C we define the state order Cg on the configurations of Unf[B): 

Definition 18 . Let Ci and C2 be two configurations of Unf[B). Then Ci C2 
iff |Ci| < IC 2 I or: |Ci| = IC 2 I, St{Ci) =p St{C2) and St{Ci) < St{C2). 

Theorem 3 . is an adequate order on the configurations of Unf{B), 

Just like the adequate order presented in [ERV97] (denoted Ar there) our order 
has the property that for each pair of events e and with *St([e]) = *St([e^]): 
either [e] Cg [e^] or [e^] Cg [e]. This has two desirable consequences: 

— the number of non-cut-off events in a complete finite prefix cannot exceed 
the number of local states (i.e. states of local configurations) 

— since events are generated in accordance with in algorithm 2, we need for 
each newly added event e only to check if there is already an event B with 
5t([e]) = 5't([e^]) in order to check that e is a cut-off event. 

We think that in comparison with the adequate order of [ERV97] our order 
is easier to understand as it is based on a syntactical lexicographical order on 
process algebra expressions. Eor the same reason we expect it to be easy to 
implement. This will be checked in an implementation of our algorithm that is 
currently under construction. 

6 Conclusions 

We have presented component event structures which are similar to both prime 
event structures and occurrence nets. The advantage of component event struc- 
tures over Petri nets is that the choice operator can be modelled naturally with 
the choice relation. When using Petri nets to model process algebra (as has been 
done in [01d91]) extra places have to be introduced, using a technical trick, to 
model the effect of choice. This trick leads to markings that do not directly cor- 
respond to process algebra expressions (only after a kind of garbage collection) 
which would greatly complicate the construction of a complete finite prefix. Our 
component event structures do not suffer from these complications. In addition. 




194 R. Langerak and E. Brinksma 



they are very similar to prime event structures which can be obtained by just 
deleting the components. 

We have shown how McMillans approach can be used for obtaining a finite com- 
plete prefix. We have presented an optimization that has the same effect as the 
one in [ERV97] but profits from the process algebra context in such a way that 
it is less complex. 

Our current research is concentrating on how the complete prefix can be trans- 
formed into a kind of graph grammar that produces the infinite behavior. This 
graph grammar representation can then be used for simulation and model check- 
ing. Furthermore, using timed, probabilistic and stochastic extensions similar to 
[KLL+98,BKLL98] we will investigate how the graph grammar can be used for 
performance modelling. We are also working on an implementation which we 
hope to finish soon. 
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Abstract. In this paper we elucidate the mathematical foundation un- 
derlying both the basic and the extended forms of symbolic trajectory 
evaluation (STE), with emphasis on the latter. The specific technical 
contributions we make to the theory of STE are threefold. Eirst, we 
provide a satisfactory answer to the question: what does it mean for a 
circuit to satisfy a trajectory assertion? Second, we make the observation 
that STE is a form of data flow analysis and, as a corollary, propose a 
conceptually simple algorithm for extended STE. Third, we show that 
the theory of abstract interpretation based on Galois connections is the 
appropriate framework in which to understand STE. 



1 Introduction 

In BDD-based formal verification, symbolic trajectory evaluation (STE) [8,3] 
is the main alternative to symbolic model checking (SMC) [5]. Compared with 
SMC, STE has the advantage that it can be applied to very large circuits directly, 
without the need to abstract the circuits before verification. This is made possible 
by a pleasant property of STE: the number of BDD variables needed in an 
STE run depends only on the assertion being checked, not on the circuit under 
analysis. Thus one can use STE to verify a collection of assertions against the 
same circuit without having to invent a different abstraction of the circuit for 
each assertion, as one often has to do when doing SMC. On the other hand, 
what STE can verify is more restricted than what SMC can. In its basic form [8], 
STE can only verify assertions over bounded intervals of time, possibly iterated 
by non-nested loops. But in its extended form [3]^, STE can verify assertions 
expressed as arbitrary state-transition graphs, thus enabling STE to verify any 
safety properties. As far as we know, STE has not been generalized to reason 
about liveness properties. 

Unfortunately, STE seems to be much less well-known than SMC, certainly 
less than it deserves to be. Partly in the hope of generating more interests in 
STE, we elucidate in this paper the mathematical foundation underlying both 
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Marten van Hulst, Victor Konrad, Carl Seger, and the reviewers for comments and 
encouragements, and to his wife and children for love and tolerance. 

^ According to Carl Seger, the basic ideas of extended (a.k.a. generalized) STE came 
out of an e-mail brainstorming session in 1994 among Derek Beatty, Randy Bryant, 
and Seger on a note written by Beatty, which unfortunately was never published. 
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the basic [8] and the extended [3] forms of STE, with emphasis on the latter. 
The main mathematical theories used in this paper — data flow analysis [4,6] 
and abstract interpretation [1,7] — are not new. And it is quite possible that 
the basic ideas of this paper were already known at an intuitive level in the 
STE research community. But, to the best of our knowledge, these theories and 
ideas have never been brought together to form a coherent framework in which 
to understand (especially the extended form of) STE. The specific technical 
contributions we make to the theory of STE are threefold. 

Eirst, we clarify the semantics of STE by providing a satisfactory answer to 
the following question: 

• What does it mean for a circuit to satisfy a trajectory assertion? 

More precisely, we propose to define the satisfaction relation for extended STE 
[3], in which trajectory assertions can have arbitrary state-transition graphs, as a 
universally quantified generalization of the form of basic STE [8] in which trajec- 
tory assertions are bounded sequences of states. This is not how the satisfaction 
relation for extended STE was originally defined in [3], which uses a definition 
containing both universal and existential quantifiers. To justify our definition, 
we show that it guarantees that a circuit satisfies a trajectory assertion iff (if and 
only if) the set-theoretic STE algorithm returns a positive answer, and that this 
is not the case for the definition in [3]. Another advantage of our definition is that 
it does not require us to make the distinction of whether a trajectory assertion 
is “oblivious” (which basically means “deterministic”), whereas the definition in 
[3] does. 

Second, we make the following observation: 

• STE is a form of data flow analysis (DEA). 

More precisely, we show that, when properly formulated, what an STE algorithm 
computes is exactly the solution of a data flow equation in the classic format [4, 
6]. Though perhaps obvious in retrospect, this point seems to have never been 
noticed before. As a corollary of this DEA formulation, we propose a BDD-based, 
completely implicit algorithm for extended STE that is very easy to understand 
and, we hope, can lead to efficient implementations of STE. (Of course, this hope 
can be confirmed or disproved only through experimentation, which is beyond 
the scope of this paper.) 

Third, we propose an appropriate framework in which to address the following 
question: 

• How is the ternary model of circuits that STE algorithms use related to the 

ordinary boolean model of circuits? 

Specifically we show that the ternary model is an abstract interpretation in the 
classic sense [1,7] of the boolean model via a Galois connection [2,7]. We also 
point out a relationship between the two models (namely, the Galois connection 
should be a simulation from the boolean model to the ternary model) that seems 
to have never been articulated in the existing literature on STE [8,3]. 

The rest of this paper is organized as follows. Section 2 presents STE from 
a set-theoretic viewpoint, in which circuits are modeled as functions operating 
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on sets of boolean vectors. Section 3 presents STE from a lattice-theoretic vie- 
wpoint, in which circuits are modeled as functions operating on ternary vectors, 
which form a lattice. Section 4 presents the conceptually simple algorithm for 
extended STE mentioned above. The proofs of all theorems and the reviews of 
some mathematical machineries are relegated to the Appendices. 

2 Set-Theoretic STE 

Eollowing [3], we start with set-theoretic STE, which manipulates sets of con- 
figurations of circuits. As will be seen, set-theoretic STE is impractical except 
on small circuits. But it provides an easy-to-understand semantic foundation by 
which STE can be related to symbolic model checking, which takes a set-theoretic 
view of circuits. Eurthermore, the development of lattice-theoretic STE in the 
next section closely parallels that of set-theoretic STE. 

2.1 Set-Theoretic Models of Circuits 

Consider a digital circuit M operating in discrete time. A configuration of M is 
an assignment of “values” to “signals” in M , representing a snapshot of M at a 
discrete point in time. In this section, exactly what “values” and “signals” are, 
is not important. All we need to assume is that the set of all configurations of 
M , denoted by C, is nonempty and finite. 

Circuits as Relations. The conceptually simplest model of M is a transition 
relation^ C C x C, where (c, c^) G means that M can in one step move 
from configuration c to configuration ch Note that since M cannot control its 
input signals, Mrsi is in general a relation rather than a function. 

Circuits £is Functions. The power set of C, denoted by 7^(C), can be viewed as 
the set of predicates on configurations, where Pi, U, and C correspond to conjun- 
ction, disjunction, and implication, respectively. Eor any Q C P(C), we denote 
by DQ and UQ the intersection and union of all members of respectively. 

Using the relational image operation, the transition relation MRei induces a 
predicate transformer ^ V{C)^V{C) in a natural way: 

^fuii(t) = ^ C I 3 c G p : (c, c^) G (1) 

for all p G V{C). Intuitively, if M is in one of the configurations in p, then in 
one step it must be in one of the configurations in Mp^fp). It is easy to show 
from (1) that Mp^^^ distributes over arbitrary union: 

AiFun(uQ) = U{Mp,,fq) \qeQ} (2) 

for all Q C V{C). Conversely, for any Mp,^^ G V{C) ^V{C) that satisfies (2), 
the equivalence: (c, U) G M^i^i U G Mp^f{c})^ where c, U G C, defines a 
TfRei C C X C that satisfies (1). Thus there is no loss of information in going 
from Mr 31 to Mpun and vice versa. 

In the remainder of this paper we will use the functional model of circuits 
exclusively and drop the subscript F^n- Note that it follows from distributivity 
(2) that M (= Mpun) both preserves 0 (i.e., M(0) = 0) and is monotonic (i.e., 
p C q ^ M{p) C M{q) for all p, g G V{C)). 
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2.2 Set-Theoretic Trajectory Assertions 

From now on we focus on a fixed, but arbitrary, circuit M G V{C) V{C)^ 
where C is nonempty and finite, such that (2) is true. 

Definition of Trajectory Assertions. A trajectory assertion for M is a quin- 
tuple A = (S', So, 7Tc), where S is a finite set of states^ sq G S is an initial 

state^ R C S X S IS transition relation^ and G S^V[C) and tTc G S^V[C) 
label each state s with an antecedent 7Ta(s) and a consequent 7Tc(s). Furthermore, 
we assume V s G S : (s, sq) ^ R^ for the technical reason that in formulating data 
flow algorithms, it is convenient to have a unique source node whose flow value 
never needs changing. No generality is lost by making this assumption. 

Satisfaction of a Trajectory Assertion by a Circuit. What does it mean for 
the circuit M to satisfy the trajectory assertion A = (S', sq, tt^, tTc)? Roughly 
speaking, it means that for every trajectory r of M and every run p of A, as 
long as r satisfies the antecedents in p, r satisfies the consequents in p. To state 
this precisely, we have to introduce some terminologies. (Also, see Appendix A.l 
for notations about sequences.) 

A trajectory of M is a nonempty sequence of configurations, r G such 
that ViGN:0<i<|r| ^ r[i] G M ({r[i — 1]}); the set of trajectories of M is 
denoted by Traj[M). A run of A is a nonempty sequence of states, p G such 
that p[0] = So and ViGN:0<i<|p| ^ (p[i — 1], p[i]) G R] the set of runs of A 
is denoted by Runs[A). Note that both Traj[M) and Runs[A) are prefix-closed. 
For any r G Traj[M) and p G Runs[A) such that \r\ = |p|, we say r a-satisfies 
(resp., c- satisfies) p, denoted by r \=a p (resp., r |=c p), iff r[i] G 7Ta(p[i]) (resp., 
r[i] G 7Tc(p[i])) for each i < \r\ = |p|. Finally, we say the circuit M satisfies the 
trajectory assertion A, denoted by M |= A, iff: 

Vr G Traj{M) : Vp G Runs{A) : |^| = |p| ^ ( r |=a p ^ r |=c p ) (3) 

Compsirison with Another Definition of Satisfaction. It is instructive to 
compare (3) with the definition used in [3]: 

Vr G Traj[M ) : 

( 3 p G Runs{A) : |r| = |p| A r |=a p A r |=c p ) V (4) 

( Vr^ N r : Vp^ G Runs{A) : \R\ = |p^| ^ (R \=a ^ R l=c aO ) 

Note that (3) implies (4), because Traj[M) is prefix-closed. The converse is not 
true; but if its first disjunct were removed, (4) would indeed be equivalent to (3). 
That first disjunct, which contains an existential quantifier, makes (4) harder to 
implement than (3). Intuitively, the existential quantifier requires backtracking 
to implement. Formally, we will show in the next subsection that (3) holds iff 
the set-theoretic STE algorithm returns a positive answer, and that (4) lacks 
this nice property. 

To get around this difficulty, [3] introduces the notion of oblivious trajectory 
assertions. A trajectory assertion A = (S', sq, R, Tr^) is oblivious iff for any 
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G S such that (s, G R and {s^s^') G it must be the case that 
bi7Ta(5^^) = 0. Consequently, given any trajectory r, there is at most one 
run p oi A such that r a-satisfies p. It is not hard to see that for an oblivious 
trajectory assertion, (4) implies (3), which then implies that (4) is equivalent to 
the set-theoretic STE algorithm returning a positive answer. With our definition 
(3), there is no need to introduce the notion of obliviousness. 

2.3 Set-Theoretic STE £is DFA 

In this subsection we show that the checking of M |= A can be formulated as a 
DFA problem [4,6]. 

Define F E S ^ (^(^) such that F{s){p) = M{7Va{s) Ci p) for all 

s E S and p E V{C). It follows from (2) that for all s G S', F(s) preserves 0 (i.e., 
E(s)(0) = 0), is monotonic (i.e., p C g ^ F{s){p) C F{s){q) for allp, q G 7^(C)), 
and distributes over arbitrary union (i.e., F{s)[uQ) = U{F{s)[q) \ q G Q} for 
all Q C V{C)). Next, define G {S ^V{C))^{S ^V{C)) such that: 

F{<P){s) = if (s = So) then C else U { E(s^)(^(s^)) | (s^, s) G 

for all ^ G S ^ V{C) and s G S. Since F{s) is monotonic for all s G S, .A is 
monotonic as well, where the function space S ^ V{C) is ordered as follows: 
F F s e S : ^(s) C F{s) for all e S ^V{C). Hence, by Knaster- 
Tarski Fixpoint Theorem [2], the fixpoint equation <P = F{F) has a least solution 
G S^V[C). Furthermore, since both S and C are finite, is the limit of 
the sequence G S^V{C) \ n G N) defined by: 

= if (n = 0) then (A s G S : 0) else F{Fn-i) (5) 

in the sense that there exists a sufficiently large A: G N such that for 

all n > A:. 

We say the circuit M satisfies the trajectory assertion A hy set-theoretic 
STE^ denoted by M |=set A, iff Vs G N : ^^{s) fl 7Ta(s) C 7Tc(s). Now we are 
ready to state our first main result: 

Theorem 1. M |=set A M \= A 
Proof, See Appendix A. 4. 

Had we used the definition adopted in [3] for M \= A (viz., (4) above), the ^ 
direction of Theorem 1 would still be true (furthermore, obliviousness is not 
needed in this part of the proof; see Appendix A. 4), but the direction would 
be false, as the following example shows. ^ Consider a trivial circuit M with only 
one signal whose value is either 0 or 1 (i.e., C = {0, 1}); this signal is the output 
of a constant source 1 (i.e., M(p) = {1} for 0 p C {0, 1}). Suppose that the 
trajectory assertion A has only three states S = {sq, 81 ^ 82 } and two transitions 
R = {(so, Si), (so, S 2 )} such that 7Ta(so) = 7Ta(si) = TVa{s 2 ) = 7Tc(so) = (0, 1} and 
7i‘c('5i) = {1} and 7 Tc(s 2) = {0}. Then (4) is satisfied, because for any trajectory 
r of M with \r\ = 2, the run p = (so, Si) satisfies both r 1=^^ p and r \=^ p. But 
M ^set A, sin ce ^*('^ 2 ) = {!}, 7 Va{s 2 ) = (0, 1}, but 7 Tc(s2) = {0}. 

^ Space limitations prevent us from including state-transition diagrams for this and 
subsequent examples, but the reader should have no trouble drawing his own. 




The Mathematical Foundation of Symbolic Trajectory Evaluation 



201 



3 Lattice-Theoretic STE 

The definition (5) of the sequence | n G N) above yields a simple method 
for computing the least fixpoint solution just compute • • • one by 

one until a fixpoint, which must be is reached. Then, by Theorem M \= A 
can be checked by checking M |=set A. Since all objects involved are finite, this 
scheme for checking M \= A ^ which we call the set-theoretic STE algorithm^ is 
clearly effective. 

Unfortunately, the set-theoretic STE algorithm is not practical except for 
small circuits. For, if the circuit M has rn boolean signals, then its set of con- 
figurations is B^, where B = {0,1} is the set of boolean values. Even with 
state-of-the-art BDD technologies, manipulating subsets of B^ is impractical 
for even moderately large m, say several hundred signals. But interesting cir- 
cuits in the real world often contain thousands of (if not more!) signals, for 
which set-theoretic STE is powerless. 

In this section we will describe what may be regarded as the key insight of 
the STE paradigm. Namely, instead of manipulating subsets of B^ directly, we 
approximate them with ternary vectors, whose sizes are only linear in rn. But, 
to compensate for possible loss of information in the approximation process, we 
may have to complicate the trajectory assertion, or use a family of trajectory 
assertions, or both. Yet, in both cases, the number of BDD variables depends 
only on the trajectory assertion (s) and not on the circuit under analysis. This 
makes it possible to do STE on very large circuits without first abstracting them. 

We will use many concepts and notations from the theory of partial orders 
and lattices [2]. In particular, the notions of complete lattices and Galois connec- 
tions are reviewed in Appendices A. 2 and A. 3, respectively. 

3.1 Lattice-Theoretic Models of Circuits 

Recall that M G V{C) ^V{C) represents a circuit such that (2) is true, and 
that the set C of configurations of M is nonempty and finite. What exactly C 
is, is not important until Section 4. 

Let (U, E) be a finite complete lattice of abstract predicates such that there is 
a Galois connection <C E V{C)xP. An abstract predicate transformer M E 
P is an abstract interpretation [1,7] ofM G P{C) ^P[C) lA M preserves bottom 
(i.e., M (T) = T), M is monotonic (i.e., p E q ^ M{p) E M{p for all p, g G U), 
and <C is a simulation relation from P[C) to P (i.e., p p ^ M[p) M[p) for 
all p G P{C) and p G P). That the Galois connection <C is a simulation relation 
can also be stated in terms of its abstraction function o; : P{C) P (viz., 
a[M{p)) E M[a{p)) for all p G P{C)) or in terms of its concretization function 
7 : P^ViC) (viz., M( 7 (p)) C 7 (M(p)) for all p £ P). Although this notion of 
simulation is standard in the literature on abstract interpretation [1,7], it seems 
to have never been used in the literature on STE [8,3]. It would be interesting 
to check whether actual implementations of STE satisfy this condition. 

We do not require of M the counterpart of (2): M(uQ) = U{M(g) | q G Q}? 
because it is not true in general. For example, suppose M abstracts a unit -delay 
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two-input AND-gate using ternary values. Then it is reasonable to require: 

M((0,1,X)U(1,0,X)) = M((X,X,X)) = (X,X,X) 
M((0,1,X))UM((1,0,X)) = (X,X,0)U(X,X,0) = (X,X,0) 

where the first two vector components correspond to the two inputs and the last 
the output. Intuitively, the join operation (0,1, X) U (1,0, X) = (X, X, X) throws 
away the information that one of the inputs is 0, so M can no longer assign 0 
to the output. Note, however, that the inequality M[\JQ) □ U{M{q) \ q G Q} 
does hold, since it is implied by the monotonicity of M. 

3.2 Lattice-Theoretic Trajectory Assertions 

A trajectory assertion for M is a quintuple A = where the 

assumptions on S\ sq, and R are the same as in Section 2.2, and Tia C S^R and 
7Tc G S ^ P are the antecedent and consequent labeling functions, respectively. 
Define ^{A) = {S,So,R,j{Tra),'l{'^c)), where j{TTa) = Xs e S : j(^a(s)) and 
t(V) = Xs e S : j(7Tc(s)). Note that j(A) is a trajectory assertion for M . 

3.3 Lattice-Theoretic STE as DFA 

Define F £ S ^ {P ^ P) such that P{s){p) = M{7Ta{s) □ p) for all s G N and 
p E P. For any s G N, since M preserves bottom and is monotonic, P{s) also 
preserves bottom (i.e., F(s)(_L) = _L) and is monotonic (i.e., p ^ q ^ F{s)[p) □ 
F{s){q) for all p, g G P). Next, define P G {S^P)^{S^P) such that: 

P{P){s) = if (s = So) then T else U { A(s^)(^(s^)) | (s^, s) G 

for all # G S and s E S. Since F{s) is monotonic for all s G P is monotonic 
as well, where the function space S ^ P is ordered as follows: 4* Q F y s E 
S : S{s) □ S^{s) for all E S ^ P. Hence, by Knaster- Tarski Fixpoint 

Theorem [2], the fixpoint equation 4 = P{4) has a least solution 4^ E S^P, 
Furthermore, since both S and P are finite, 4^ is the limit of the sequence 
(^71 E S^P \ n E~N) defined by: 

Sn = if (n = 0) then (AsgN:_L) else P{Sn-i) (6) 

in the sense that there exists a sufficiently large k E IS such that — 4^ for 
all n > A:. 

We say the abstract circuit M satisfies the abstract trajectory assertion A hy 
lattice-theoretic STE^ denoted by M |=Lat A, iff Vs G N : S:^{s) n7Ta(s) □ 7 Tc(s). 
Now we are ready to state our second main result: 

Theorem 2. If M is an abstract interpretation of M ^ then: 

Ai |=Lat A ^ M |=set T(^) 



Proof, See Appendix A. 5. 
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The converse of Theorem 2 is not true. For example, consider a circuit with five 
signals (m, 7 where (resp., j' 2 ) H (^ 2 ) delayed by one unit of time 

and o is the unit-delayed AND of and ] 2 . Suppose that the trajectory assertion 
has five states {50,51,5^^,52,53} and five transitions {(50, 5 i), (50, 5 i), (51, 52), 
( 5 i, 52), (52, 53)} and the labeling: 7Ta(5i) = ( 0 , 1,X,X,X), 7Ta(5i) = (1,0,X,X,X), 
7^'c('53) = (^7 d); all other labels are (X, X, X, X, X). Intuitively, the antece- 

dent at 5 i (resp., 5 i) assumes that A = 0 and ^2 = 1 (resp., ii = 1 and 12 = 0 ) at 
time 1, and the consequent at 53 checks that at time 3, o = 0 regardless of which 
assumption was used. It is easy to verify that for this example, M |=set 7 (A) but 
M ^Lat A. And the reason is simple: at time 2, when the information from 5 i 
and 5 i is merged at 52, we have: 

{(0,1)}U {(1,0)1 = {(0,1), (1,0)} but (0,1)U(1,0) = (X,X) 

the latter of which loses information. Clearly, this merge could be avoided by 
duplicating 52 and 53, so that there is a separate copy of them to deal with 
each of the assumptions TVa{si) and But then the number of states in the 

trajectory assertion increases. This kind of trade-offs between complexity and 
precision is typical of STE. 

4 An Implicit Algorithm for Lattice-Theoretic STE 

Up to this point, except in a few examples, we have not needed to specify what 
exactly the set C of configurations is, except that C should be nonempty and 
finite. This makes our theory more general. But in order to have a BDD-based 
implementation, we have to make up our mind now as to what C is. Thus, in 
this section, we shall assume that C = for some m G N. In other words, M 
is a boolean circuit with rn signals. Furthermore, we assume that the abstract 
circuit M operates on ternary vectors, i.e., P = T([(. How sets of boolean vectors 
can be approximated by ternary vectors is explained in Appendix A. 3. 

Similar to the set-theoretic case, the definition ( 6 ) of the sequence (^^ | n G 
N) yields a simple algorithm for checking M |=Lat A: compute ^0, ^1, ^2, • • • one 
by one until a fixpoint, which must be is reached; then check M |=Lat A 
using its definition. Note that since the converse of Theorem 2 is not true, this 
algorithm, which we call the lattice-theoretic STE algorithm^ can give falsely 
negative answers (i.e., when M ^Lat A but M \= 7 (A)). But, by virtue of 
Theorems 1 and 2 , it can never produce falsely positive answers (i.e., M |=Lat A 
does imply M \= 7 (A)). We now argue that the lattice-theoretic STE algorithm 
can be implemented using BDDs in a straightforward manner. 

First, notice that every ternaray value t G T can be encoded with two boolean 
values: Bo{t) = (0 □ t) and Bi{t) = (1 Et). With this encoding, join and meet 
are implemented by: Bi{tLlT) = Bi{t) V Bi{T) and Bi{tnT) = Bi{t) A Bi{T)^ 
where i G B. For any m G N, this encoding and the associated join and meet 
operations can be extended component- wise to T([^. Note that _L has multiple 
encodings (viz., all m-tuples of boolean pairs in which at least one of the pairs 
is such that Bq = Bi = 0), 
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Without loss of generality, suppose the state space S of the trajectory as- 
sertion is for some A: G N. With the above encoding of ternary values, the 
objects manipulated by the lattice-theoretic STE algorithm have the following 
types: R E x ^B and 7Ta, tTc, C B^ ^ (B x B)^, for all n G N. It 

is not hard to see that these objects can all be represented by BDDs on at most 
2k variables, and that R and the checking of M |=Lat ^ can be implemented by 
BDD operations on these BDDs, provided that the output of the abstract circuit 
M G ^ for any given input can be computed without ever having to 
represent M itself as BDDs (which would require 2m variables). Real-world STE 
implementations amply demonstrate that this proviso is practical. 

We emphasize again that the maximum number of boolean variables needed 
by our algorithm, 2k ^ depends only on the trajectory assertion and not on the cir- 
cuit. Of course, this independence is somewhat illusory, since the possible loss of 
information in the approximation by ternary vectors may necessitate more com- 
plex state-transition structure in the trajectory assertion, which would increase 
k. Eurthermore, note that our formulation so far has been “unparameterized” 
in the sense that the antecedents and consequents are simple ternary vectors 
without parameters. In fact, they can be parameterized by boolean variables, so 
that a single run of the parameterized algorithm is equivalent to multiple runs 
of the unparameterized algorithm, one for each truth assignment to the boolean 
parameters. Needless to say, such parameters increase further the total number 
of boolean variables. 

Despite its simplicity, the STE algorithm outlined above does not seem to 
have ever been implemented. Can it be as efficient in practice as, or even more so 
than, current implementations of extended STE? We do not know, but it would 
be interesting to find out. 

A Appendices 

A.l Sequences 

Let N = {0, 1, 2, • • •} be the set of natural numbers. For any set V and any n G N, 
(resp., and E^) denotes the set of all finite sequences of length n (resp., positive 
and nonnegative lengths) over V. Let a, r G E^. The length of a is denoted by |cr|, 
the concatenation of a followed by r by <J^ and a being a prefix of r by a r. A 
set AC is prefix- closed iff a G A and r < a imply r ^ S. For any i G N with 
0 < i < |a|, the Ath element of a is denoted by a[i\. (Note that we index sequence 
elements starting from 0 instead of 1.) The last element of a is denoted by last{a)^ 
i.e., last{(j) = cr[|a| — 1]. The empty sequence (i.e., the sequence whose length is 0) 
is denoted by (). A sequence consisting of elements * * ,^n-i G E (in that 

order) is written as (vo,vi,V2,* * * , ^n-i). We use the terms “sequences” and “vectors” 
interchangeably; the elements of vectors are sometimes referred to as “components” . 

A. 2 Complete Lattices 

A complete lattice is a poset (P, C) in which the meet and join of elements of any 
subset Q C P, denoted by FlQ £^nd VAQ respectively, always exist. Intuitively, we think 
of the elements of a complete lattice as “predicates”, so that □, U, and C corresponds 
to “conjunction”, “disjunction”, and “implication”, respectively. 
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For any set V, its power set P(M), ordered by set inclusion C, forms a complete 
lattice. Here □, U, and □ are Pi, U, and C, respectively. 

Let T = {0, 1, X} be the set of terno.ry values^ where X denotes an unknown value. 
Intuitively, X signifies a lack of information: it could be 0 , it could be 1; we simply 
don’t know. We partially order T as follows:^ OCX and 1 □ X. For any m E N, this 
order on T is extended component-wise to T^. But (T^, C) is not a complete lattice, 
because it lacks a bottom. We fix this by introducing a special bottom element, T, 
such that T C t and T / t for all t E T^. Now U {T}, ordered by C, is 

indeed a complete lattice. We denote the top element (X, • • • ,X) of by T. 

A. 3 Galois Connections 

Galois Connections as Relations. Let and be complete lattices 

of “concrete predicates” and “abstract predicates” , respectively. (Below we will drop ^ 
and ^ from and and the meet and join operations they induce, since they will 
always be clear from the context.) A Galois connection [2,7] from to is a binary 
relation C x P^, where p^ reads: can be approximated by p^”, such 

that for all C P^ and C P^: 

^ U < n (7) 

where we define: Vp^ G : Vp^ G : p^ ^ p^. Intuitively, (7) says that 

the approximation relation is an extension of the partial orders inside P^ and P^ 
to between P^ and PK 

Galois Connections as Functions. The usual definitions of Galois connections 
in the literature [2,7] are in terms of an abstraction function a : P^ ^ P^ and a 
concretization function 7 : P^ ^ P^ , which in our framework can be derived from 
as follows: 

a(jp) = n {p^ G P^ I p^ ^ p^} l{p^) = L-l {p^ G P^ I p^ ^ p^} ( 8 ) 

for all p^ G P^ and p^ G P^. Intuitively, o(p^) (resp., 7 (p^)) is the most precise appro- 
ximation of p^ (p^) in P^ (P^)* Conversely, the relation <C can be derived from a or 7 
as follows: 

p^ ^ p^ ct(p^) □ p^ p ^ p^ ^ p^ □ 7(p^) (9) 

for all p^ G P^ and p^ G P^. It is easy to see from (8) and (9) how a and 7 can be 
derived from each other. It is not hard to show that 7 preserves top (i.e., 7 (T^) = T^), 
is monotonic (i.e., p^ \Z 7(p0 G 7 (g^) for all p^,g^ G P^), and distributes over 

arbitrary meet (i.e., 7 (rK 5 ^) = n{ 7 (g^) E P^ \ E Q^} for all C P^). Similarly, a 

preserves bottom, is also monotonic, and distributes over arbitrary join. 

Galois Connection from to T^. For any m G N, there is a natural Galois 

connection <C from P(B^) to T^, which is most conveniently defined by specifying 
its concretization function 7 : T([^^P(B^): 

7((fyr • • fym-i)) = { (fy, • • • fym-i) G B"^ I Vi < m : p / X fy = P } 

7(1) = 0 

for all (to, * * * Pm-i) ^ T^. In other words, for any ternary vector t G T^, j(t) is the 
set of all boolean vectors G B^ that agree with t on all non-X components (so X’s can 

^ Our ordering of T is the reverse of that used in [8,3]. We do so because we want to 
make clear that the ordering of T is an abstraction of set inclusion. 
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be thought of as “wild cards”). Note that 7 is in fact a bijection from to those 
subsets of that are (hyper)cubes. From 7 , the Galois connection C P(B^) x 
and the abstraction function a : P(B^) can be easily derived using (8) and (9). 

Intuitively, 6 t iff the cube corresponding to t contains 6 , and a( 6 ) is the element of 
that corresponds to the smallest cube in B^ that contains b. 

A. 4 Proof of Theorem 1 

We prove the two directions of separately. 

The direction: Suppose this is not true, i.e., M |=set A but M ^ A. Then 
M ^ A implies that there exist r G Truj[M) and p G Runs[A) such that |r| = |p|, 

T \=a p, and A \=c p^ but last(r) 0 7 Tc(/ast(p)), where A and p^ are the prefixes of 

T and p respectively such that \r\ = |r| — 1 and \p^\ = |p| — 1. We claim that for all 

i G N with 0 < i < |t|, r[i] G 4^~^(p[i]). This is proved by induction on i. The base case 

t = 0 is trivial, since = C. For the induction step, assume the claim is true for 

i < \t\- 1. Then r[i] G 4>^{p[i]) n7Ta(p[i]), since r |=a p. So r[i-\- 1] G F{p[i]){4>^{p[i])), 
since r G Traj{M). But, since 4^^ = F{p[i])(4^{p[i])) C 4^{p[i + 1]). This 

completes the induction step, so the claim is true. But the claim implies that last(r) G 
4^[loM[p)) n 7Ta(/ast(p)), which implies last(r) G 7Vc{last(p)) because M |=set A. So 
last(r) G 7Vc{lcist(p)) and last(r) 0 7Tc(^ast(p)), a contradiction. 

The ^ direction: Since F[s) is distributive over arbitrary union for all s G S', 
a weh-known result from DFA [4] states that the least fixpoint solution of the 
equation 4 = F{4) is identical to the union-over- ah-runs solution. More precisely, 4^ 
must satisfy the following equation: 

4^[s) = if (s = So) then C else U {G[p) \ p G Runs[A) A [last^p), s) G R} 

for all s G S, where G : Runs[A) U {( )} ^ V{G) is defined inductively by C(()) = G 
and G(p'^'{s)) = P(s) (C(p)). Let c G C and s G S. Using the definitions of C, U, and 
M, the equation above can be rephrased as: 

c£ 4^{s) ^ dr G Traj{M) : 3 p G Runs{A) : 

|r| = |p| A last(r) = c A last{p) = s A 
Vi G N : 0 < i < |r| - 1 r[i] G ^a{p[i]) 

Conjoining c G 7 Ta(s) to both sides, we get: 

c G 4>=^(s) n 7 Ta(s) dr G Traj(M) : dp G Runs(A) : 

|r| = |p| A last{r) = c A last[p) = s A r \—a p 

Now the definition of M |= A shows that the ^ direction is indeed true. 

A. 5 Proof of Theorem 2 

Throughout this proof, we will freely use the right half of (9): p p p C y(p) 
for all p G V{G) and p ^ F. For set-theoretic STE, the notations are exactly the same 
as in Section 2.3 and Appendix A. 4, except that the (concrete) trajectory assertion is 
7 (A) = (A, So, R, 7 (^a), t(^c)) instead of A. 

Claim 1: p <C p P(s)(p) P(s)(p) for all p G P(C), p £ and s G A. This 

is proved as follows: 
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j{P{s){p)) = 
D 



D 



j{M{Tla{s) n j3)) 
M{j{na{s) n Jj)) 

n 7(p)) 

M{j{na{s)) n p) 
F{s){p) 



{ definition of F } 

{ is a simulation relation } 

{ distributivity of 7 } 

{p ^ p and monotonicity of M } 
{ definition of F } 



where { * * * }’s give the justifications of proof steps. 

Claim 2: ^^{s) <C for all s £ S. Since ^^{s) = lim^^n(s) and S^{s) = 

lim(^n(s), it suffices to prove that 4^n{s) ^n(s) for all s E S' and n E N. This is 

proved by induction on n. The base case is trivial, since a Galois connection always 
relates the two bottoms. For the induction step, assume 4^n{s) ^n(s) for all s E S. 

That ^^n+i(-9o) "C ^n+i(so) is also trivial, since a Galois connection always relates the 
two tops. For any so / s E S, we have: 



7(<^n+i(s)) = I {s',s)eu}) 

3 u{77y)(^nE)))IE.s)ei?} 

D U{F{s'){<P„{s')) I {s',s)eR} 



{( 6 )} 

{ monotonicity of 7 } 

{ 4^n{s) ^n{s) and claim 1 } 



= ^n+l(5) 

This completes the induction step, so the 
Finally, suppose M |=Lat A. Then, for 

7(^c(-s)) 5 7(^=,(s) n7Ta(s)) 

= 7(^*(s)) n 
3 <P*{s) n 7(7Ta(s)) 
Therefore, M |=set 77)- 



{( 5 )} 

claim is true, 
all s E 5', we have: 

{ M |=Lat A and monotonicity of 7 } 
{ distributivity of 7 } 

{ claim 2 } 
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Assume- Guarantee Refinement 
between Different Time Scales * 
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Abstract. Refinement checking is used to verify implementations against 
more abstract specifications. Assume-guarantee reasoning is used to de- 
compose refinement proofs in order to avoid state-space explosion. In 
previous approaches, specifications are forced to operate on the same 
time scale as the implementation. This may lead to unnatural specifica- 
tions and inefficiencies in verification. We introduce a novel methodology 
for decomposing refinement proofs of temporally abstract specifications, 
which specify implementation requirements only at certain sampling in- 
stances in time. Our new assume-guarantee rule allows separate refine- 
ment maps for specifying functionality and timing. We present the theory 
for the correctness of our methodology, and illustrate it using a simple ex- 
ample. Support for sampling and the generalized assume-guarantee rule 
have been implemented in the model checker Mocha and successfully 
applied to verify the VGI multiprocessor dataflow chip with 6 million 
transistors. 

1 Introduction 

Formal verification is a systematic approach for detecting logical errors in de- 
signs. The design is first described in a language with a mathematical semantics 
and then analyzed for correctness with respect to a specification. We refer to 
the design being analyzed as the implementation. The verification problem is 
called refinement checking when the specification is a more abstract design. The 
refinement- checking problem is PSPACE-hard in the size of the implementation 
description. Not surprisingly, algorithms for refinement checking are exponential 
in the size of the implementation description. This is the so-called state- explosion 
problem in verification. 

Specifications are typically less detailed than the implementation. For ex- 
ample, the specification of an adder might simply state that the output is the 
sum of the two inputs, whereas the implementation might be a gate-level adder 
circuit, which operates at the detail of individual bits. Nonetheless, common 
notions of correctness require specifications to operate in “lock-step” with the 
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implementation: every possible computation step of the implementation must be 
matched by an admissible computation step of the specification, ff the natural 
time scale of the specification is less detailed than that of the implementation 
— for example, if the gate-level adder requires several clock cycles to compute 
a sum — then the specification often is “slowed down” by stuttering, even for 
perfectly synchronous designs. A prominent example of this occurs in pipeline 
verification, where the fnstruction Set Architecture (fSA) specification usually 
is slowed down by introducing a nondeterministic stall signal to stretch its time 
scale to match that of the pipeline [HQR98,McM98]. fnstead of slowing down 
the specification, we pursue the alternative of “speeding up” the implementation. 
For this purpose, we use an operator called Sample^ which samples the behavior 
of the implementation at appropriately defined sampling instants. ^ 

Our motivation for sampling arose specifically from the attempt of verifying 
a 96-processor V(ideo) G(raphics) f(mage) chip designed by the fnfopad project 
at the University of California, Berkeley [STUR98]. There, the specification con- 
sists of ISAs for the individual processors and FIFO buffers that abstract the 
point-to-point communication protocols, which interact subtly with the proces- 
sor pipelines. Since the implementation contains level-sensitive latches and dif- 
ferent parts of the circuit are active at high vs. low phases of the clock, sampling 
must be used to match the implementation time scale with the specification time 
scale. While the computational advantages of sampling in state-space exploration 
have been demonstrated in [AHR98], the VGI is still far beyond the scope of 
exhaustive search. Hence, we needed to generalize a compositional verification 
methodology to accommodate the Sample operator. 

Scalable approaches to refinement checking make use of the compositional 
structure of both implementation and specification, and divide the verification 
task at hand into simpler subtasks. Suppose we have a refinement-checking 
problem of the form Pi 1 1^2 A Q 1 WQ 2 , and the state space of the implemen- 
tation P 1 IIP 2 , even when sampled, is too large to be handled by exhaustive 
search algorithms. A naive compositional approach would attempt to prove both 
Pi ^ Qi and P 2 A Q 2 , and then conclude that Pi||P 2 ^ Qi\\Q 2 - Though sound, 
the naive approach often fails in practice, because Pi usually refines Qi only 
in a suitable constraining environment, and so does P 2 . The constraining en- 
vironments are taken into account by the assume- guarantee approach, which 
concludes P 1 HP 2 ^ Q 1 WQ 2 from the two proof obligations P 1 HQ 2 ^ Qi and 
Q 1 IIP 2 ^ Q 2 (the apparent circularity in such proofs is resolved by an induction 
over time) [Sta85,Kur94,AL95,AH96,McM97,HQR98]. Note that the assume- 
guarantee approach avoids constructing P 1 HP 2 . With assume-guarantee reason- 
ing, the difficulty in refinement checking shifts from the sizes of the involved 
state spaces to the “semantic gap” between implementation and specification. 
To bridge this gap, one writes witness modules that map implementation sig- 



^ The Sample operator is similar, but not identical, to the Next operator of Reactive 
Modules [AH96]: while Next changes the time scale of a module and its environment, 
Sample does not constrain the environment, which therefore may offer multiple in- 
puts between sampling instances. 
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nals to specification signals, and refinement constraints that relate specification 
signals to implementation signals [HQR98]. 

In this paper, we generalize the assume-guarantee method to accommodate 
the sampling operator. If implementation and specification operate at the same 
time scale, witness modules generate values for hidden specification variables 
at each step. However, if a single macro- step of the specification corresponds 
to several micro-steps of the implementation, it is necessary to provide witness 
modules that operate at the micro-step level. The purpose of such a witness is 
to generate the correct value for the specification signal to be witnessed and 
to maintain that value until the next sampling instance. Dually, if implemen- 
tation and specification operate at the same time scale, refinement constraints 
provide abstract definitions for implementation variables at each step. If one 
specification step corresponds to several implementation steps, then it no longer 
suffices for the refinement constraints to supply values for the implementation 
variables at the rate of the specification — at sampling instances — but addi- 
tional constraints need to be provided between sampling instances. Providing 
different refinement constraints at the macro and micro levels enables a sepa- 
ration of concerns: while macro-level constraints (at sampling instances) tend 
to describe the functional behavior of an implementation variable, micro-level 
constraints (between sampling instances) tend to describe its timing behavior. 
This separation of functionality and timing is particularly natural for collections 
of synchronous blocks that communicate asynchronously. 

We develop the theory and methodology to carry out assume-guarantee rea- 
soning when specifications are abstract in both space (fewer variables/components) 
and time (fewer observation points). The crux of the theory lies in the ability 
to distribute the Sample operator over the parallel composition of implementa- 
tion components using micro- level refinement constraints. The resulting assume- 
guarantee proof rule produces refinement obligations both at the macro level 
and at the micro level, which are then discharged by our model checker Mocha 
[AHM+98]. While micro-level refinement obligations can be handled using tra- 
ditional model checking, we had to enhance Mocha to discharge macro-level 
refinement obligations. We have used this methodology successfully in the veri- 
fication of the VGI chip. We found several subtle bugs which were unknown to 
the designers. The case study describing this effort can be found in [HLQR99]. 

Speeding up the implementation using Sample has significant computational 
advantages when compared to earlier approaches that slow down the specifica- 
tion using stuttering. The sampled implementation typically has a smaller state 
space than the original implementation. If the refinement checking uses explicit 
state enumeration, we can use a secondary stack to explore and discard the im- 
plementation states between two sampling instances. If we use symbolic search 
using binary decision diagrams (HDDs), the sampled state space can be typically 
represented using a smaller HDD [AHR98]. 

Outline of the paper. Since the VGI is too complex for the purpose of expos- 
ing our methodology, we present a simple GCD example in Section 2. Mocha 
operates in the heterogeneous modeling framework of Reactive Modules [AH96]. 
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In order to introduce sampling, we have to generalize modules to so-called tran- 
sition constraints^ which need not be executable. This is done in Section 3. The 
Sample operator and its properties are introduced in Section 4. The assume- 
guarantee methodology of [McM97,HQR98,McM98] is generalized in Section 5 
to accommodate the Sample operator, and the generalized methodology is ap- 
plied in Section 6 to the GCD example. Finally, in Section 7, the VGI verification 
is described briefly. 



2 Example 




(a) High level (b) Intermediate specification 

specification 



Fig. 1. GGD Specification 

We consider a design that computes the Greatest Common Divisor ( GCD) of two 
numbers. We will start with a synchronous specification shown in Figure 1(a). 
Given two inputs a and &, the module GCDSpl computes the GCD of a and h 
and places the result in the output r. The boolean input validin asserts that the 
inputs are valid in the current round and the boolean output validout asserts 
that the output is valid in the current round. Module GCDSpl operates syn- 
chronously, with a delay of one round. If inputs a and b are given in the current 
round, then the output is available at r in the next round. 

We refine our specification and add more spatial and temporal detail on how 
the GCD is computed. We use Euclid’s algorithm to compute the GCD: 

GCD (a, 6) 

{Given positive non-zero integers a and 6, compute GCD {a, b) } 

(1) if (a = b) return (a); 

(2) if ((a = 1) or {b = 1)) return (1); 

(3) small := min(a, b); 

(4) big := max(a, b); 

(5) return {GCD {small, big — small)); 



The resulting refinement GCDSp2 (Figure 1(b)) has three modules: IntfS, 
DoneS, and CornpS . Given two numbers, the DoneS module decides if the GCD 
is computed trivially (if the numbers are equal, or one of the numbers is 1). If so. 
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(a) Implementation 




(b) Implementation with wit- 
ness modules 



Fig. 2. GCD Implementation 



it sends the result, otherwise, it resends the numbers in increasing order. Suppose 
small and big are sent by the DoneS module. The CompS module responds by 
sending small and big — small. The IntfS module takes data inputs from both 
CompS and the environment and feeds the data to the DoneS module. The 
modules IntfS ^ DoneS ^ and CompS communicate with each other using point- 
to-point communication links. Valid bits (valid valid 2 and valid‘s) are used to 
validate the presence of meaningful data on these links. For instance, if IntfS 
wants to send two numbers to DoneS ^ it places the numbers in aout and bout^ 
and sets valid\ to TRUE. Each of these communications is assumed to complete 
in one round. 

While CCDSpl requires only one round to compute the CCD^ the module 
CCDSp2 requires multiple rounds depending on the data inputs. We add an 
additional variable inprogress in module CCDSp2 and set it to TRUE whenever 
a CCD computation is in progress. Using this variable and the Sample operator 
in Section 4, we will formally state how CCDSp2 refines CCDSpl . 

Our final level of refinement uses a single physical broadcast channel for 
communication between the modules. Time-division multiplexed access (TDMA) 
is used to share the channel. Communication in the channel is conducted in units 
called frames. A frame is divided, in time, into several time-slots. Each module 
is allocated one or more time-slots to send data. There is a beacon module Rc, 
that signals the beginning of a frame. Each module has its local counter that 
is synchronized on the Be module’s signal. Once the frame starts, each module 
sends data in its allocated time-slots. A valid bit sent on the channel indicates if 
the data being sent in the current time-slot is valid. The allocation of time-slots to 
modules is done statically at configuration time, and stays fixed thereafter. Thus, 
every module knows the identity of the sender in each time-slot. Figure 2(a) 
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shows the block digram of the implementation Impl. In our example, a frame 
is divided into 6 slots. The figure also shows the allocation of time-slots within 
the frame to individual modules — the first two time-slots are given to the Intf 
module, the next two to the Done module, and the last two to the Comp module. 
The Intf ^ Done^ and Comp modules are intended to have the same functionality 
as the specification modules IntfS^ DoneS^ and CornpS from CCDSp2. However, 
while the communication between modules in CCDSp2 happens in a single round 
through point-to-point links, the communication between modules in Impl is 
through a shared channel, and takes several rounds. Let sync be a variable 
of the module Be that is set to TRUE whenever the Be module sends the 
synchronizing signal. We will use sync and the Sample operator to relate Impl 
to CCDSp2 in Section 4. 

3 Transition Constraints and Modules 

Transition constraints. Reactive Modules is a formalism for the modular de- 
scription of systems with heterogeneous components. The definition of reactive 
modules can be found in [AH96]. In this paper, we generalize modules to tran- 
sition constraints. A transition constraint is a temporal constraint on a set of 
variables. The state of a transition constraint is determined by values of two 
kinds of variables, namely, observable variables and private variables. The state 
of a transition constraint changes in a sequence of rounds. The first round is 
called the initial rounds and determines initial values for all variables. Each 
subsequent round is called an update rounds and determines new values for all 
variables. A sequence of states such that the first state results from an initial 
round, and each successive state in the sequence results from the previous state 
from an update round is called a trajectory. The projection of a trajectory to the 
observable variables is called a trace. The semantics of a transition constraint is 
its set of traces. 

A state of a transition constraint A is a valuation for all its variables. We use 
primed variables to denote the new value in a round, and unprimed variables to 
denote the old value. Formally, the transition constraint A contains two predi- 
cates, namely, an initial predicate and update predicate. The initial predicate is 
a boolean function over the primed variables of A. A state s of A is initial if it 
satisfies the initial predicate of A. The update predicate is a boolean function 
over both the primed and unprimed variables of A. Given two states s and t, we 
write s t if the update predicate of A evaluates to TRUE, when its unprimed 
variables are assigned values from s and its primed variables are assigned values 
from t. A trajectory of A is a finite sequence 5q, . . . , of states such that (1) 
is an initial state of A, and (2) for i G {0, 1, ... ,n — 1}, we have Si 
The states that lie on trajectories are called reachable. An observation of A is a 
valuation for the observable variables of A. If s is a valuation to a set of variables, 
we use to denote the valuation restricted to the observable variables of A. 
For a state sequence J = 5q, . . . , 5^, we write [J]a = [-^ o ] a , • • • , [sn]A for the corre- 
sponding observation sequence. If T is a trajectory of A, then the projection [s]a 
is called a trace of A. Sometimes there is a need to effect a transition constraint 
only for a fixed number of update steps. We use A^ to denote the transition 
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constraint with the same variables as A, which restricts their valuations only 
up to T update rounds. Formally, a state sequence ^ = 5 q, . . . , is a trajectory 
of A'^ if (1) n < r and ^ is a trajectory of A, or (2) n > r and 5 q, . . . , 5-,- is a 
trajectory of A. Note that if r < 0, then every state sequence is a trajectory of 

Modules. Modules are a special class of transition constraints with two special 
properties: (1) differentiation between inputs and outputs, and (2) executable 
non- blocking semantics. The observable variables of a module are partitioned 
into external variables and interface variables. External variables are updated 
by the environment and can be read by the module, and interface variables are 
updated by the module and can be read by the environment. The interface and 
private variables of a module are called controlled variables. 

The state of a reactive module changes again in a sequence of rounds. For the 
external variables, the values in the initial and update rounds are left unspecified 
(i.e., chosen nondeterministically) . For the controlled variables, the values in the 
initial and update rounds are specified by (possibly nondeterministic) guarded 
commands. A formal description of modules can be found in [AH96]. 

Parallel composition. The composition operation combines two transition 
constraints into a single transition constraint whose behavior captures the in- 
teraction between the two components. Two transition constraints A and B are 
compatible if their sets of private variables is disjoint. Given two compatible 
transition constraints A and 5, the composition A\\B is a transition constraint. 
The observable variables of A\\B are the union of the observable variables of A 
and B. The private variables of A\\B are the (disjoint) union of the private vari- 
ables of A and B. The initial predicate of A\\B is the conjunction of the initial 
predicates of A and i^, and the update predicate of A\\B is the conjunction of 
the update predicates of A and B. 

Refinement. The notion that two transition constraints describe the same sys- 
tem at different levels of detail is captured by the refinement relation between 
transition constraints. The transition constraint B is refinable by transition con- 
straint A if every observable variable of B is an observable variable of A. The 
transition constraint A refines the transition constraint B^ written A ^ B^ if 
(1) B is refinable by A, and (2) for every trajectory J of A, the projection [J]b 
is a trace of B. 

4 The Sample Operator 

Let A be a transition constraint and (/? be a predicate on the primed and un- 
primed observable variables of A. We define B = (Sample A at p) to be a 
transition constraint. The private and observable variables of B are the private 
and observable variables of A. The initial predicate of B is equal to the ini- 
tial predicate of A. The update predicate of B is TRUE at the pair of states 
(s,t) iff there is a sequence of states . . . , such that (1) s = 5 q, (2) 

t = Sn, (3) Si Si^i for i = 0, 1, . . . , n — 1, and (4) (f(si, is FALSE for 
i = 0,1,. ..,n — 2 and TRUE for i = n — 1. Informally, B updates from state 
s to t if there exists a sequence of rounds of A starting at s and ending at t, 
such that the final round satisfies c/p, and none of the intermediate rounds satisfy 
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(/p. Given a trajectory of A, we use the term sampling instants to refer to the 
instants in the trajectory where p is true. 

We note that the Sample operator differs from the Next operator of [AH96] 
in two ways: (1) Sample operates on transition constraints, whereas Next oper- 
ates on modules, and (2) Sample does not constrain the environment between 
sampling instants, whereas Next constrains the environment to not change be- 
tween sampling instants. In the rest of this paper, whenever we write B = 
{Sample A at we assume that c/p is a predicate on the primed and unprimed 
observable variables of A. 

Example. Recall the GCD computation example from Section 2. The module 
GCDSpl is the high-level specification. The intermediate-level specification and 
implementation modules are composed as follows: 

GGDSp2 = IntfS\\DoneS\\GompS 
Impl = Bc\\Gh\\Intf\\Done\\Gom.p 

We wish to relate the intermediate specification GGDSp2 to the high level spec- 
ification GGDSpl . Recall that GGDSpl computes GGD in one round, whereas 
GGDSp2 takes multiple rounds. Also recall that inprogress is a variable of 
GGDSp2 that is set to TRUE when GGDSp2 is doing the GGD computation, 
ff we sample the behaviors of GGDSp2 during the instances where inprogress 
is FALSE, the sampling should conform to the behaviors allowed by GGDSpl . 
Using the Sample operator, we can express this requirement as: 

{Sample GGDSp2 at {-^inprogress' )) A GGDSpl 

We also wish to relate GGDSp2 to the final implementation Impl. Recall that 
every communication in GGDSp2 happens in a single round, whereas every 
communication in Impl takes several rounds (as many rounds it takes to transmit 
a frame) to complete. Recall that the Be module has a variable sync that is set 
to TRUE when it sends the synchronization signal. Though Impl and GGDSp2 
operate at different time scales, if we consider any trace of Impl and sample only 
the instants where sync is TRUE, the resulting subsequence should be a trace 
of GGDSp2. Using the Sample operator, we can express this requirement as: 

{Sample Impl at sync') A GGDSp2 

Properties of Sample. The following proposition asserts the distributivity of 
Sample with respect to the parallel-composition operator. 

Proposition 1. [Distributivity of Sample] If A^ B are any two transition con- 
straints and ip is a predicate on the observable variables common to A and B ^ 
then {Sample (d.||5) at p) A {Sample A at p) || {Sample B at p). 

fn practice, the traces of {Sample A at p) || {Sample B at (p) form a large 
superset of the traces of {Sample (^||5) at p). ft is desirable to constrain the 
observable variables of A and B while distributing the Sample operator over 
parallel composition. We can strengthen the above proposition in the presence 
of suitable transition constraints Ta and Tb on the observable variables of A and 
5, respectively. The resulting proposition, given below, will be used in the next 
section to carry out assume- guarantee reasoning between different time scales. 
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Proposition 2. [Constrained distrihutivity of Sample] Consider transition con- 
straints B/1 'a (ifid Tsy such that every variable oflA is an observable variable 
of and every variable ofTs is an observable variable of B . If A\\B ^ Ta\\Tb 
and ip is a predicate on the observable variables common to A and B^ then 
(Sample (^||i^) at (p) ^ (Sample (A.||'i'^) at p) || (Sample (B\\Tb) at p). 

5 Refinement Checking Methodology 

We generalize the methodology for assume-guarantee style refinement checking 
given in [HQR98] to accommodate the Sample operator. 

Witness modules. The problem of checking if yl ^ 5 is PSPACE-hard in the 
state space of B. However, the refinement check is simpler in the special case in 
which all variables of B are observable. The transition constraint A is projection 
refinable by transition constraint H if (1) 5 is refinable by A, and (2) B has no 
private variables. If B is projection refinable by A, then every variable of B is 
observable in both B and A. Therefore, checking if A S B reduces to checking 
if for every trajectory ~s of A, the projection [s]b is a trajectory of B. This can 
be done by a transition-invariant check [HQR98], whose complexity is linear in 
the state spaces of both A and H. If H is projection refinable by A, then A S B 
iff (1) if s is an initial state of A, then [s]b is an initial state of H, and (2) if s 
is a reachable state of A and then [s]b ^b Ms- 

Suppose that B is refinable by A, but not projection refinable. This means 
that there are some private variables in B. Define B'^ to be the transition con- 
straint obtained by making every private variable of B an observable variable. If 
we compose A with a module W whose observable variables include the private 
variables of H, then B'^ is projection refinable by the composition A|| IT. More- 
over, if IT does not constrain any observable variables of A, then A|| IT ^ B'^ 
implies A ^ H (in fact, A is simulated by B). Such a module IT is called a 
witness to the refinement A S B. We require IT to be a module, because we 
need to enforce that IT does not constrain the observable variables of A, and 
that it does not deadlock. In order to check refinement, it is sufficient to first find 
a witness module and then check projection refinement. If the sample operator 
needs to be applied to the implementation to relate it to the specification, the 
witness could be composed either before or after applying the Sample operator. 

Proposition 3. [Sampled Witnesses] Consider two transition constraints A and 
B^ and a predicate p on the variables of A such that B is refinable by A. Let 
W be a module such that the interface variables of IT include the private vari- 
ables of B ^ and are disjoint from the observable variables of A. Then (1) B^ is 
projection refinable by (Sample (A||IT) at p)^ and (2) (Sample (A|| IT) at p) S 
B'^ implies (Sample A at p) S B . 

Proposition 3 is a generalization of a proposition from [HQR98]. The latter can 
be obtained by setting p to TRUE in Proposition 3. 

Assume-guarantee reasoning. The state space of a transition constraint may 
be exponential in the size of its description. Consequently, even checking projec- 
tion refinement may not be feasible. However, typically both the implementation 
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A and the specification B consist of the parallel composition of several transition 
constraints, in which case it may be possible to reduce the problem of check- 
ing if A A to several subproblems that involve smaller state spaces. The 
assume-guarantee rules allows us to conclude A A 5 as long as each component 
of the specification B is refined by the corresponding components of the im- 
plementation B within a suitable environment. The apparent circularity in the 
environment assumptions is resolved by an induction over time. If we operate 
with modules, the acyclic await dependencies of legal modules breaks this circu- 
larity. Since our purpose in this paper is to carry out assume-guarantee reasoning 
with the Sample operator, which gives transition constraints, not modules, we 
impose a well-founded order on the specification components [McM98] to break 
the circularity. 

Proposition 4. [Assume- Guarantee [McM98]] Let A = Ai HA 2 II . . . ||A^ and 
B = ||i^2|| • • - ll^m transition constraints. Let -< he a well-founded order- 

on the components of B^ let Z{Bi) = {Bj\Bj-<Bi}^ and let Z^(Bi) = {Bj\Bj ^ 
Z{Bi)}. For each Bi^ let Ci he some composition of transition constraints from 
A^ let Di he some composition of transition constraints from Z(Bi)^ and let Ei he 
some composition of transition constraints from Z^{Bi). If Cf\\DJ\\Ef~^ A BJ 
for all 1 <i <rn and r G N; then A S B . 

Proposition 4 produces proof obligations of the form WmWEr^ :< B[. For 
each specification component Bi^ the corresponding implementation component 
that is intended to implement the functionality specified by Bi is Ci , and the 
constraining environments are Di and Ei. This obligation can be discharged by a 
reachability computation on Ci\\Di\\Ei. At each stage of the reachability one has 
to merely check that every transition possible in Ci\\Di is also allowable in Bi. 
Note that while the transition constraint Ei is used to constrain the reachable 
states of Ci\\Di^ it is not used to constrain the transition invariant check at each 
step. 

While discharging the obligation for Bi according to Proposition 4, we would 
like to keep the state spaces to be small. Thus, it is preferable to choose most 
components of the constraining environment from the specification. Unfortu- 
nately, due to lack of detail, the specification does not have sufficiently abstract 
definitions of internal variables of the implementation. For all transition con- 
straints A, B, and F, if A A B\\E and B is refinable by A, then A S B. Thus, 
we can arbitrarily “enrich” the specification by composing it with new transition 
constraints. Before applying the assume-guarantee rule, we may add components 
to the specification and prove A A F||Fi|| • • • ||F^ instead of A S B. The new 
transition constraints . . . , F are called abstract constraints^ and they usu- 
ally give high-level descriptions for some implementation variables, in order to 
provide a sufficient supply of abstract components while applying Proposition 4. 
While witness modules are introduced “on the left” of a refinement relation, 
abstract constraints are introduced “on the right.” 

Suppose the left side of a refinement relation is of the form Sample{A\\B) at p. 
ft is not directly possible to apply the assume-guarantee rule from Proposition 4 
in such cases. However, we can distribute the Sample operator with respect to 
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parallel composition using Proposition 1. In practice, Proposition 1 tends to pro- 
vide abstractions that are too coarse to be useful. To see why, imagine A and 
B as modules, each constraining the other’s inputs. By distributing the Sample 
inside the parallel composition, B is allowed to constrain the inputs to A only 
at the sampling instants. The inputs to A are essentially unconstrained between 
sampling instants. Symmetrically, the inputs to B are constrained at sampling 
instants by A and left unconstrained between sampling instants. In several com- 
mon situations, the interactions between A and B can be orthogonalized into 
(1) functionality, and (2) timing of the communication protocol used for the in- 
teraction. The functionality determines values at the sampling instants, whereas 
the timing determines how these values propagate between sampling instants. 

Suppose Ta and Tb are transition constraints that specify how the inputs 
to A and B behave between sampling instants. Then, we can use Proposition 2 
to distribute the Sample operator, while constraining the inputs to A by Ta 
and the inputs to B by Tb- Thus, we get the following generalization of the 
assume- guarantee rule, with the Sample operator. 

Theorem 1. [Assume- Guarantee with Sample] Let A = A.i||A2|| . . . ||^n 
B = ||i^ 2 || ' ' ' \\Bm transition constraints. Let Ti he a transition constraint 

on the observable variables of Ai^ for 1 <i <n. Let be a well-founded order- 
on the components of B^ and let T ^ Z and be defined as follows: T = 
{'I\, Tn}, Z{Bi) = and Z^{Bi) = {Bj\Bj ^ Z{Bi)}. For each 

Bij let Ci be some composition of transition constraints from A^ let Ui be some 
composition of transition constraints from T ^ let Di be some composition of 
transition constraints from Z(Bi)^ and let Ei be some composition of transition 
constraints from Z^{Bi). If A ^ Ti\\ . . . \\Tn; (ifid is any predicate such that 
{Sample (Q||[/^) at ^ Bf for all 1 < i < m and r e N, then 

{Sample A at fj) S B . 

In the above theorem, note that the antecedent A ^ 7i|| . . . ||7n can itself be 
discharged by traditional assume-guarantee reasoning (Proposition 4). 

6 Refinement Proof 

Recall the high-level specification GCDSpl , intermediate specification GGDSp2^ 
and implementation Lmpl from the previous sections. As stated in Section 4, the 
refinements we would like to verify are: 

{Sample GGDSp2 at {-^inprogress^)) ^ GGDSpl 
{Sample Lmpl at synS) S GGDSp2 

In this section, we will focus on how to carry out the second refinement, which 
relates the intermediate specification GGDSp2 to the implementation lmpl. We 
first observe that GGDSp2 is not projection refinable by Impl^ due to the pres- 
ence of private variables in GGDSp2 that represent point-to-point communi- 
cation channels. The module lmpl has a single channel that is shared by all 
modules using TDMA. The specification, while more abstract in time, provides 
individual point-to-point channels for communication. Each round of GGDSp2 
corresponds to multiple rounds of lmpl, during which a frame is transmitted. It 
is possible to relate the values that appear on the shared implementation chan- 
nel, at particular time-slots during the communication of a frame, to values that 
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appear on particular point-to-point channels in the specification. For instance, 
the values of the specification variables aout and bout at the end of each round 
are expected to be equal to the values occurring in time-slots 1 and 2 of the 
frame. We can write a witness module IntfW that looks at the implementation 
channel during the transmission of the frame, collects the values at time-slots 1 
and 2, and assigns them to aout and hout^ respectively. If the values in time-slots 
1 and 2 are valid, then validi is set to TRUE. Further, the values assigned to 
aout^ hout^ and validi are retained till the end of the frame. Similar witness 
modules DoneW and CompW can be written. All these witnesses take inputs 
from the channel as shown in Figure 2(b). Let ImplW be the module given by 

ImplW = Impl\\IntfW\\DoneW\\CompW . 

Note that the witnesses do not interfere with the channel — they merely ob- 
serve the values on the channel. Due to Proposition 3, it suffices to check that 
(Sample ImplW at sync') A GCDSp2. Recall GCDSp2 = IntfS\\DoneS\\GompS . 
We use the order DoneS-<GompS-<IntfS and apply Theorem 1. Let us consider 
the component GornpS of GGDSp2. The component of Impl that is intended to 
implement the functionality of GornpS is Gornp. We wish to check if: 

(Sample Gornp at sync')'^\\DoneS^ A GornpS^ , 

This check fails because the outputs of module GornpS^ namely, am, bin and 
valid s are not present in module Gornp. Adding the witness module and appro- 
priately constraining it, we obtain the obligation: 

(Sample (Gomp\\GompW\\Gh\\Bc) at sync')'^ A Gom.pS'^ . 

This still fails, because we have not constrained the inputs of Gornp. fn this 
obligation, the specification GornpS looks at the inputs small ^ big^ and valid 2 in 
every round and produces corresponding outputs am, bin and valid 3 in the next 
round. The implementation Gornp anticipates two values at time-slots 3 and 4 
(which correspond to small and big^ respectively) of every frame, ff these inputs 
are valid, then Gornp generates values in time-slots 5 and 6 (which correspond 
to am and bin respectively) of the next frame. The module Gornp makes the 
following assumptions: (l)the inputs are available at time-slots 3 and 4, (2)either 
both inputs are available at a given frame, or none of the inputs are available, 
and (3)if both inputs are available, the first input (from time-slot 3) is smaller 
than the second input (from time-slot 4). fn our refinement obligation, the in- 
puts to Gornp have to be constrained both at the sampling instants and between 
sampling instants, in order to satisfy the above assumptions. The specification 
component DoneS supplies the constraint at sampling instants, which ensures 
that assumptions 2 and 3 are satisfied. The timing assumption Done W con- 
strains the inputs between sampling instants and ensures that assumption 1 is 
satisfied. Thus we get the proof obligation 

(Sample (Gomp\\GompW\\Gh\\Bc\\DoneW) at sync')'^\\DoneS^ A GornpS^ . 

Similarly, we can verify the correctness of modules Done and Intf separately. 
The complete refinement proof, which is a direct application of Theorem 1, uses 
the ordering DoneS-<GompS-<IntfS: 
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{Sample {Done\\DoneW\\Ch\\Bc\\IntfW) at sync'Y ^ DoneS^ 

{Sample {Comp\\ Comp W\\Ch\\Bc\\ Done W) at syncy\\DoneS^ ^ CompS^ 

{Sample {Intf\\IntfW\\Ch\\Bc\\CompW) at sync'y\\CompS^\\DoneS^ ^ IntfS^ 



{Sample {ImplW) at sync) ^ GCDSp2 

Each of the obligations above the line involves a single implementation com- 
ponent, possibly along with specification components, witnesses and abstract 
constraints to constrain the inputs. They can be automatically discharged by 
Mocha. We thus conclude that {Sample Irnpl at synd) S GCDSp2. 

7 VGI Processor Verification 

The approach described in this paper has been used in a case study [HLQR99] 
to verify a large parallel DSP processor. The VGI chip [STUR98] is an array of 
DSP processors designed to be part of a system for web-based image processing 
[SR97]. The VGI chip contains a total of 96 processors and has approximately 6M 
transistors. Of the 96 processors, 64 are 3-stage pipelined compute processors. 
Each compute processor has about 30,000 logic gates. Data is communicated 
between the processors by means of EIEO queues. No assumptions are made 
about the relative speeds at which data is produced and consumed in the pro- 
cessors. Hence, to transfer data reliably, an elaborate handshake mechanism is 
used between the sender and the receiver. In addition, the interaction between 
the control of the pipeline and the control of the communication unit is quite 
complex. The details of the VGI processor verification can be found in [HLQR99]. 
Here, we only discuss the role the Sample operator in the verification of VGI. 

The implementation has a clock signal elk with activity on both the HIGH 
and LOW phases in different parts of the design. Eor instance, in the execute phase 
of the pipeline a bus carries an operand when elk is HIGH and the result when 
elk is LOW. But the specification does not mention elk at all. In fact, the whole 
computation happens in just one step. Thus, one round of the specification is 
equal to two rounds of the implementation, one with elk = HIGH and one with 
elk = LOW. Therefore, we sample the implementation whenever elk is low and 
check if the sampled behavior is present in the specification. 

In the remainder of this section, we use (f to refer to elk = LOW. Our goal 
is to verify that an arbitrary network of compute processors implements its cor- 
responding IS As (instruction set architectures), using refinement checking. Let 
Pi, ^ 2 , • • • , be the compute processors, and let Qi, Q2, • • • , Qn be their re- 
spective IS As. The verification problem is to check that 

Sample (P1HP2H • • • \\Pn) at ^ S (Q1HQ2II * * * ||Qn). 

Eor the correct functioning of a processor it is essential that all input signals 
change only when elk is HIGH. Let Ti be a module that says that all input 
signals of Pi change only when elk is HIGH. We use Theorem 1 to decompose 
this proof as follows: 

{Sample{Pi\\Ti) at ip) S Qi for all 1 < i < n 



Pi P2 • • • Pn S Pi 


T2 * 


■■ Tn 


Sample{Pi P2 • • • Pn) at ip S Q\ 


Q2 


Qn 
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The second antecedent says that the inputs of any processor in the network 
change only when elk is HIGH. Since any input to a processor has to be the 
output of some other processor, this antecedent can be discharged easily by 
proving that for all 1 < i < n, the outputs of Fi change only when elk is HIGH. 
This is an easy proof local to each processor. In the first antecedent, there are n 
symmetric proof obligations, one for each Pi. Each Pi can be in any one of a finite 
number of configurations. Moreover, if Pj and Pk are in the same configuration, 
then the jth and A:th proofs are identical except for variable renaming. In this 
way, we decompose the proof of a 64 processor network to proofs about individual 
processor configurations that have 800 latches each. This is still beyond the scope 
of monolithic model checking. We use Theorem 1 again, to decompose the proof 
for a single processor configuration into about 35 proof obligations and check 
them using Mocha. None of the individual lemmas require more than a few 
minutes on a 625 MHz DEC Alpha 21164. We found and fixed several subtle 
bugs that were unknown to the designers. Most bugs were in the interaction 
between the pipeline and the communication protocol. 
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Abstract. We propose an algorithm for LTL model checking based on the classifi- 
cation of the automata and on guided symbolic search. Like most current methods 
for LTL model checking, our algorithm starts with a tableau construction and uses 
a model checker for CTL with fairness constraints to prove the existence of fair 
paths. However, we classify the tableaux according to their structure, and use effi- 
cient decision procedures for each class. Guided search applies hints to constrain 
the transition relation during fixpoint computations. Each fixpoint is thus trans- 
lated into a sequence of fixpoints that are often much easier to compute than the 
original one. Our preliminary experimental results suggest that the new algorithm 
for LTL is quite efficient. In fact, for properties that can be expressed in both CTL 
and LTL, the algorithm is competitive with the CTL model checking algorithm. 



1 Introduction 

Successful application of model checking requires strategies to bridge the gap between 
the size of the models and the capacity of the model checkers. Abstraction closes the gap 
from above by eliminating unnecessary detail from the models and decomposing com- 
plex proofs into sequences of simpler ones. Abstraction is fundamental to the practical 
use of model checking. It is important, however, to close the gap also from below — ^by 
increasing the capacity of the model checkers. Indeed, too much reliance on abstraction 
inevitably means too much reliance on manual intervention, which in turns entails low 
productivity and exposure to errors. 

The symbolic approach to model checking (BDD-based [4,26] and, more recently, 
SAT-based [2]) addresses the complexity issue by representing models, sets of states, 
and paths as solutions to equations. Though not uniformly superior to the approach based 
on the explicit representation of states, symbolic model checking can deal with many 
more states and transitions. On the other hand, it has proved hard to predict whether a 
model will exceed the memory and time limits imposed on a given experiment: Whereas 
models with as many as 5000 state variables have been analyzed successfully without any 
abstraction, other models with 30 state variables turn out to be intractable. In this paper 
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we propose techniques that improve the performance and robustness of BDD-based 
model checking algorithms for linear time properties [24,34]. 

Model checking for Linear Time Logic (LTL) is usually based on converting the 
property into a Biichi automaton (or tableau), composing the automaton and the model, 
and finally checking for emptiness of the language of the composed system. The last step 
can be performed by CTL model checking with fairness constraints [6]. In the context 
of this general strategy, our contribution is twofold: First, we propose a classification 
of the automata obtained by translation of the properties; our classification refines the 
one proposed in [20] to three types: general, weak, and terminal automata. We show 
that applying a specific decision procedure to each class results in an algorithm that is 
superior to the standard one both in theory and in practice. Different tableau constructions 
produce automata that may differ according to our classification. We adopt the procedure 
of [15] because it tends to produce automata that are amenable to more efficient decision 
procedures. 

Converting properties into automata and applying specialized decision procedures 
based on the structure of the automaton tends to reduce the number of fixpoints that must 
be computed by the model checker. If the number of fixpoints is reduced to one, on-the- 
fly model checking can be easily applied [1]. In Section 5 we show that this sometimes 
produces substantial savings in memory and CPU time, even when comparing to CTL 
model checking. In general, our experiments confirm and strengthen the observation of 
[20] about the efficiency of LTL model checking. 

Our second contribution is the extension of guided symbolic search from reachability 
analysis [32] to LTL model checking. Guided symbolic search applies constraints to the 
transition relation of the model to make the computation of fixpoints more efficient. The 
constraints are eventually lifted, so that the result of the computation does not differ 
from the the one of the conventional approach. However, by exploring the state space 
not in strict breadth-first fashion, guided search can be substantially more efficient than 
conventional fixpoint computations. The constraints can be seen as hints on the order in 
which transitions should be explored. Effective hints can be derived with only a limited 
understanding of the behavior of the model subjected to verification. In this paper we 
show how to apply hints to both least and greatest fixpoint computations. The asymmetry 
in the two computations is another reason for reducing the number of fixpoints when 
translating LTL properties. 

The rest of this paper is organized as follows. Section 2 reviews the background ma- 
terial. Section 3 discusses the classification of the automata derived from LTL properties 
and the decision procedures for each class, while Section 4 deals with the application of 
guided symbolic search to model checking. Section 5 presents our preliminary experi- 
mental results, and Section 6 summarizes, outlines future work, and concludes. 

2 Preliminaries 

2.1 Linear Time Model Checking 

We adopt the positive normal form (a.k.a. negation normal form) for the specification of 
LTL. Given a set of atomic propositions A, the standard boolean connectives, and the 
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temporal operators X {next time), U (until) and R (releases), LTL formulae in positive 
normal form are defined as follows: 



- true, false, the atomic propositions, and their negations are formulae; 

- if Lp and ^ are formulae, then so are Lp \/ ^, Lp /\^,'KLp, Lp and p>R^. 



It is customary to define two additional operators: F (p abbreviates true U p and G p ab- 
breviates false R p. The boolean connectives ^ and ^ are also defined as abbreviations 
in the usual way. 

We define the semantics of LTL with respect to a Kripke structure {S,T,Sq,A,L), 
where S is the set of states, T C S' x S' is the transition relation, So C S is the set 
of initial states, A is the set of atomic propositions A, and L : S ^ 2^ is the labeling 
function. The transition relation is assumed to be complete’, that is, every state has at 
least one successor. An infinite path tt in M is an infinite sequence sq, -si, . . . such that 
{si, G T for i > 0. We denote by the suffix of tt starting at s^. The satisfaction 
of an LTL formula along path tt of M is defined as follows. 



TT 1= true 

TT \= piff p e L{sq) forpeA 
7v\=pV'ip if f7v\=p or 7v\='ip 
TT 1= X(/? iff 7T^ \= P 

TT 1= (/p U iff there exists i > 0 such 
that 7T^ 1= tfj, and for all j, 0 < j < i, 

7V^ \= p 



TT ^ false 

TT 1= ^p iff p ^ L{so) forpeA 
7v\=pA'ipiff7v\=p and tv \= 'ip 

TV \= pRip iff for alH > 0 7T^ |= ^ip, 
or there exists j, 0 < j < i, such that 

TV^ 1 = p 



A formula is satisfied in a Kripke structure M if it is satisfied along a path of M such 
that So e Sq. A formula is valid in a Kripke structure M if it is satisfied along all paths 
of M such that sq G Sq. Given an LTL formula in positive normal form, its negation can 
be computed by recursively applying De Morgan’s Laws and the following identities: 
~^Xp = X-^p, and -^{p U fi) = -^p R Writing the negation in positive normal 
form does not change the length of the formula \p\, if one assumes that for an atomic 
proposition p, \p\ = \-^p\.^ Therefore we can efficiently solve the validity problem for 
p by checking the satisfiability of ^p. This is the approach that we adopt in the sequel. 

Model checking of linear time property p is usually accomplished by constructing a 
Biichi automaton from the formula ^p. This automaton is often referred to as the 
tableau of the formula; it accepts runs that visit sets of fair states infinitely often. The 
product of the automaton and the model M is then analyzed to see if it contains a 
so-called /a/r cycle. A fair cycle reachable from the initial states signals satisfaction of 
hence, it is a counterexample to the validity of p in M. When explicit enumeration 
is used, the run time of the model checking algorithm is linear in the size of the model 
and exponential in the length of the formula. 

In the rest of this paper we shall have occasion to refer to logics other than LTL. 
Computational Tree Logic (CTL) is a branching time logic. Temporal operators are 
always preceded by universal or existential path quantifiers in CTL formulae. The ex- 
pressiveness of CTL is not comparable to that of LTL: Properties like AG EFp have no 
equivalent in LTL, while F Gp (fairness) is not expressible in CTL. Model checkers for 

^ This assumption is valid for BDD-based model checkers. 
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CTL usually allow the user to specify fairness constraints separately from the property. 
Both LTL and CTL are subsumed by CTL*, which is in turn subsumed by the L //2 
fragment of the /x-calculus. The reader interested in the formal definition and a detailed 
analysis of these logics is referred to [12]. 

2.2 Symbolic Model Checking 

The main difficulty to model checking comes from the size of the state space S. This 
is typically true also of LTL model checking, in spite of the exponential dependence of 
the runtime on the length of the formula, because the formulae of interest are usually 
short. Symbolic model checking [6,26,2] addresses this concern by representing sets of 
states implicitly via their characteristic functions . In this paper we consider BDD-based 
symbolic model checking, in which Binary Decision Diagrams [4] are used to represent 
the characteristic functions. Although almost all boolean functions have exponentially 
sized BDDs [27], symbolic model checkers have been successful on problems that 
vastly exceed the capacity of explicit enumeration algorithms. BDDs can be manipulated 
efficiently; in particular, algorithms have been devised for the computation of all the 
successors {image computation) or predecessors {pre-image computation) of a set of 
states according to a given transition relation [10,5,14,31]. 

Symbolic model checking algorithms for various logics are based on the computation 
of fixpoints by repeated image or pre-image computations. In the relational /x-calculus 
(see, for instance, [26]), the computation of the states reachable from S'o is expressed by 
the formulae 



EY p = Xy3x.T[x^ y) A p[x) 

Rch S'o = pZ.So V EY Z , 

which prescribe a sequence of image computations. Symbolic LTL model checking, on 
the other hand, is normally based on the algorithm of Emerson and Lei [13] for the Lp 2 
fragment of /x-calculus. If the acceptance condition of the automaton is described by a 
set of fair states, C, the set of states from which a fair cycle can be reached. Fa i r, is given 
by: 



EXp = Xx3y.T[x^ y) A p[y) 

Ep U g = pZ.q V [p A EX Z) 

Fair = i/Z.EXE(ZU (ZAC)) . 

If Fair n S'o 7^ 0 for M x then there is a fair path, and the LTL formula p is not 
valid in M. The observation that Fair is the set of states that satisfy the CTL [9] formula 
EG true under the fairness constraint C has led to the use of symbolic model checkers 
for fair CTL in LTL model checking [6,8]. 

Many fixpoint computations used in symbolic model checking, including the two just 
mentioned, can be formulated both in terms of image computations (forward traversal of 
the state space) and in terms of pre-image computations (backward traversal). In recent 
times considerable attention has been devoted to the relative efficiency of the alternative 
formulations [19,18,17,28]. 
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Besides the direction of traversal of the state space, an important factor affecting the 
efficiency of symbolic model checking algorithms is the presence of multiple fixpoints, 
especially if nested. Thus, the computation of fair states is intrinsically more difficult 
than the computation of reachable states. 



3 Classification of Tableaux 

As outlined in Section 2.1, a linear time property can be checked by converting its 
negation into a Bfichi automaton called the tableau of the property, composing the tableau 
with the model, and checking language emptiness. The last step of this procedure involves 
the computation of nested fixpoints and is therefore potentially expensive. The question 
naturally arises as to whether LTL model checkers can compete with CTL model checkers 
for those properties that can be expressed in both logics. Kupferman and Vardi [20,21] 
call these properties branchable and observe that many of them translate into tableaux 
with special structure. They claim that an appropriate variant of the LTL model checking 
algorithm can then achieve efficiency comparable to that of CTL model checkers. In this 
section we recall the classification of [20] and refine it in a natural yet effective way. 

Different tableau construction procedures have been proposed in the literature [24, 
34,6,8,15]. Even though the automata produced by these procedures for a given formula 
obviously accept the same language, they have different structures, and therefore are not 
equivalent from the point of view of the classification we propose. We discuss this issue 
at the end of this section. 

A Buchi automaton is a quintuple (27, Q, Qo,6, F), where F is the input alphabet, 
Q is the finite set of states, Qo F Q is the set of initial states, 2 : Q x 27 ^ 2^ is the 
transition function, and F C Q is the acceptance condition. An input word is accepted 
iff there is a run of the automaton on that word that visits F infinitely often. We assume 
that the transition function is complete, that is S{q,a) ^ 0 for all g G Q and a e F. 

A Buchi automaton is weak [20,29] iff there exists a partition of Q into Qi , . . . , Qn 
such that each Qi is either contained in F or disjoint from it; in addition, the blocks of 
the partition are partially ordered so that the transitions of the automaton never move 
from Qi to Qj unless Qi < Qj . 

Theorem 1. The language of a weak Buchi automaton A is empty iff A \= -lEFEG T7 

Proof A run of a weak Buchi automaton that leaves a block Qi of the partition cannot 
enter it again. Hence, the only way for a run to visit a fair state infinitely often is 
to eventually be confined inside one Qi C F. Such a run therefore is a witness for 
EFEG F. Conversely, if A^sq \= EFEG F for some sq G Sq, then there is a fair run in 
A and its language is not empty. □ 

A Buchi automaton is terminal iff it is weak and the blocks of the partition contained in 
F are maximal elements of the partial order. 



Theorem 2. The language of a terminal Buchi automaton A is empty iff A |= ^EF F. 
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Proof. An accepting run in a terminal Biichi automaton must reach a maximal block 
Qi of the partition, otherwise no fair state is visited. Conversely, a run that reaches a 
maximal block can always be extended to a fair run, because 5 is complete. It is therefore 
necessary and sufficient for a fair run to reach a fair state. □ 

Theorems 1 and 2 provide the foundation for our model checking strategy. The CTL 
model checker is used to prove -<EG true under the fairness constraint P\ -lEFEG P\ 
or ^EF -iF, depending on the classification of the automaton. Correctness follows from 
the fact that the composition of the model and a terminal Biichi automaton is a terminal 
Buchi automaton, and the composition of the model and a weak Biichi automaton is 
weak. Checking whether an automaton is weak or terminal can be done in polynomial 
time. The checking of the properties can be carried out by either backward or forward 
analysis. Lorward analysis applied to terminal automata corresponds to reachability 
analysis. 

As pointed out in [16], the Emerson-Lei algorithm, which is quadratic in the size 
of the state space, is often much slower in practice than reachability analysis and CTL 
model checking, which are linear. Our classification has the desirable effect of using the 
more efficient algorithms when possible. 

Several variants of the tableau construction have been proposed in the literature. All 
are based on the identity (/?Ut/; = 7/;V((/pAX((/pU f)), but they differ in the details. 
Ligure 1 shows the tableaux produced by the procedures of [8] (left) and [15] (center) 
for the formula / = p U g. It also shows a variant of the tableau of [15] (right) that has 
labels on the arcs instead of the states and a complete transition function. 





Fig. 1. Tableaux for p U g. Pair states are indicated by double circles, initial states have an extra 
incoming arrow, and negation is indicated by an overbar. 



The construction of [8] identifies the elementary subformulae of the given formula / 
(p, g, and X / in our example) and creates one state in the tableau for each combination 
of elementary subformulae of /. It then adds a transition (s, if, for all elementary 
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subformulae g, either X g holds in s and g holds in or X ^ does not hold in s and g 
does not hold in 

The procedure of [15] creates states of the tableau on the fly, starting with a node that 
represents / and adding nodes according to the syntactic structure of the the formula. 

The disparity in number of states is the most visible difference between the results of 
the two constructions, but it is of little consequence on the efficiency of symbolic model 
checking. In fact, since every atomic proposition appearing in / also appears — possibly 
negated — in the label of each state, the left tableau shares the state variables for p and 
g with the model and only requires one additional bit for X/. The other two automata 
need two extra bits. 

Another difference between the two constructions is of greater import for the effi- 
ciency of the model checker. The automaton on the left of Fig. 1 is not weak, unlike the 
other two (which are indeed terminal). The approach of adding all possible transitions 
to the automaton at once, instead of adding them as they become needed tends to create 
more paths in the tableau; hence, it tends to prevent the partial ordering of the states 
required for weakness. 

Therefore, we use the construction of [15] modified to yield automata with labels 
on the arcs and complete transition functions as in [20]. These modifications allow us to 
easily express the automata in Verilog that we use as input language for our experiments. 
It should be noted that our choice is not optimal from the point of view of the number 
of state variables and transitions of the composition of the model and the property 
automaton. 

4 Guided Search in Model Checking 

4.1 Guided Search for the Computation of Least Fixpoints 

In [32] it is shown that symbolic reachability analysis, and hence invariant checking, 
can be substantially sped up by applying hints. The hints are predicates on the inputs or 
state variables of the model. Their effect is to inhibit some transitions; it is obtained by 
conjoining the hints and the transition relation. Several hints may be applied in sequence. 
Therefore the computation of the reachable states is decomposed in the computation of 
a sequence of fixpoints — one for each hint. 

Theorem 3. Given a sequence of monotonic functionals ti , T 2 , . . . , such that Ti < Tk 
for 0 < i < k, the sequence po, pi, . . . , pk of least fixpoints defined by 

po = 0 

Pi = pX.pi-i V TifX), 0 <i<k 

monotonically converges to p = pX.Tk{X); that is, po < pi <•••< Pk = P- 

Proof We prove by induction that < p. The basis is trivially established (po = 0 < p). 

For the inductive step we have: 



Pi = pX.pi_iy Ti{x) < pX.pi_iV Tk{x) = pX.Tk{x) , 
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where the last equality follows from the inductive hypothesis and the properties of 
fixpoints. The sequence is clearly monotonic, and fori = k the inductive step shows 
that pk = p. □ 

Decomposing the computation of a least fixpoint may have two main advantages; both 
are based on the fact that an appropriately chosen (i.e., a properly chosen hint) may 
make the computation of pi orders of magnitude faster than the direct computation of 
p [32]. The first advantage is that one may not need to compute the whole sequence 
of fixpoints. For instance, a state that violates an invariant may be contained in pi, in 
which case the rest of the computation can be avoided. The second advantage applies 
also to cases in which the computation of p must be carried to completion. Indeed, it 
may be much more efficient to compute p from Pk-i than to compute it directly. In this 
latter respect symbolic guided search differs from explicit guided search [35], in which 
guidance is only used to accelerate the detection of states where invariants do not hold. 

In [32] evidence is presented in support of the claim that finding good hints requires 
understanding of the system to be verified at a level comparable to that required to write 
functional tests for it. In this paper we extend the use of hints from invariant checking 
to LTL. 



4.2 Guided Search for the Computation of Greatest Fixpoints 

The method presented in [32] applies to the computation of least fixpoints. Least fix- 
points suffice to check invariants, but not for the properties of more expressive logics. 
In this section we therefore describe the extension of guided symbolic search to the 
computation of greatest fixpoints. Hints produce underapproximations of the transition 
relation; therefore, guided symbolic search can complement known methods [22,25,7, 
23], when both lower bounds and upper bounds are required [30]. 

The main objective of the works just cited is to prove the desired property of the 
system on a simplified model. Our objective is complementary: We want to speed up 
the computation of the fixpoints for a given model, by addressing the computational 
bottlenecks; for instance, image computations that are too slow and memory consuming 
because of poor quantification schedule [14,31]. 

Another possible reason for using underapproximations in greatest fixpoints is to 
deal with nested fixpoints. If a greatest fixpoint is nested in a least fixpoint, or vice versa, 
then by using underapproximations for both computations, one obtains an underappro- 
ximation of the result if either computation is restricted to a prefix of the sequence. (For 
instance, po, ... ,pj for j <k'm the case of a least fixpoint.) 

Theorem 4. Given a sequence of monotonic functionals ti,T 2 , . . . ,Tk such that Ti < Tk 
for 0 < i < k, the sequence r/o , r/i , . . . , defined by 

??o = 0 

f]i = uX.pi-i V n{x), 0 <i<k 



monotonically converges to p = iyX.Tk{X); that is, Po < Pi < • • • < Pk = V- 
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1) 2) 3) 

Fig. 2. Greatest fixpoint computation illustrating rji > rji-i V uX.Ti(X). 

The proof is along the lines of the proof of Theorem 3. It should be observed that 
Vi ^ Vi-i ^ iyX,Ti{X), and that the inequality can be strict, as shown in the example 
of Fig. 2. In the example k = 3. The rightmost graph can be though of as the original 
system, and the other two graphs as the systems obtained by applying hints. Let EXiX 
compute the predecessors of the states in X in the graph shown in Part i of Fig. 2. Let 
Ti = XX, p A EX^X. With these definitions, r/ is the set of states along infinite paths where 
p always holds. One can verify that r/2 = {S'1,5'2,53} = r/ > {S' 2 } = piV ]yX.T2{X). 

Theorem 4 can speed up model checking in two ways. If the greatest fixpoint being 
computed is the outermost fixpoint, as in EG true, then we may not have to compute 
all k fixpoints in the sequence if there is indeed a fair cycle. In these cases, a procedure 
that yields a large subset of the fixpoint at a small fraction of the computational cost 
is desirable. This application of Theorem 4 is complementary to the approach of [16], 
which works best when there are no fair cycles. 

In cases where convergence must be proved. Theorem 4 can still speed up compu- 
tation if and r/+ are given upper bounds on the reachable states and the greatest 
fixpoint, and Vi ^ A 77+ for i < A:. This case occurs in the example of Fig. 2 , in 
which r/2 equals the trivial upper bound on r/ given by S itself; hence, r/3 needs not be 
computed. Finally, if R~^ A 77 + > 77 ^ for all i < A:, it may still be that Vk-i = V - this 
case, the last iteration of the computation of r/k can be skipped because rji-i is a lower 
bound on 77 ^. 

Comparing Theorems 3 and 4, the following observations are in order. First of all. 
Theorem 3 can be extended by not insisting on the convergence of all the fixpoints except 
the last. This is possible because the j-th iterate of the computation of pi contains pi-i 
and is contained in pi. However, this is not the case of the greatest fixpoint computation. 
Another important difference is that knowledge of p^-i helps the computation of pj. 
more than knowledge of r/k - 1 helps the computation of 77 /^. In practice, application of 
Theorem 3 tends to be more effective than application of Theorem 4 when convergence 
must be reached. 
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5 Experimental Results 

In this section we present preliminary experimental results that we have obtained with 
VIS 1.3 [3], extended to accept hints. CPU times are measured on an IBM Intellistation 
running Linux with an Intel Pentium II 400 MHz CPU and 1 GB of RAM. 

We consider properties that can be expressed in both LTL and CTL. This gives us 
a means to contrast our decision procedure against the CTL decision procedure. The 
results can be found in Table 1. The leftmost column gives the model and the type of 
property. Soap is a model of a token-passing algorithm for distributed mutual exclusion 
[11]. The token is passed along a spanning tree of a network of processors. Lor this 
model we checked a liveness property of the form G{p ^ F q) and two safety properties 
of the form G {p ^ X{q Rr)) expressing the requirement that access to the resource is 
not granted to a processor unless the processor has requested it. The first of these two 
properties fails (error in the specification), while the second passes. Gcd is a model of 
a circuit computing the greatest common divisor of two integers; Pcell is a model of a 
production cell; Palu is a three-stage pipeline with an ALU and a register file; Fpmult is 
a floating point multiplier; finally, NullModem is a circuit to check the correctness of a 
simple UART and of the handshaking between the UART and a processor. 

We ran all the applicable algorithms for each formula and compared CPU time and 
memory requirements. The table also reports the number of pre-images ( EX) and images 
(EY) computed by each approach. These numbers indicate whether model checking 
used pure forward analysis (0 pre-images), pure backward analysis (0 images) or a 
combination of the two: reachability analysis followed by backward model checking. 
Lor each formula we checked, one of these three methods was clearly superior to the 
other two. Results are reported for that method. Some of the results could have been 
improved by using forward CTL [19], but for uniformity all experiments have been 
conducted using backward CTL only. 

The importance of choosing the right algorithm is underlined by the results of Table 1 , 
which were obtained with the same fixed variable order for all the runs of a given model 
and without hints. They confirm the observation of [20] about the comparable efficiency 
of CTL and LTL model checkers when the right algorithms are used for the latter. 
Notice, however, that experiments performed with dynamic variable reordering enabled 
may yield quite different results, simply because the orders end up being different. Lor 
formulae that fail, the ability to check the property on-the-fly may result in a substantial 
advantage to our algorithm. 

Nullmodem is a special case. The property GFpVFGqis not expressible in CTL 
without using fairness constraints. All three approaches used for that example use a 
fairness constraint and compute nested fixpoints. The example is included to show that 
there are cases in which the LTL model checking algorithm that uses an extra automaton 
is faster than the standard fair CTL algorithm. 

We next examine the effects of applying hints. Results for invariant checking are 
reported in [32]; hence here we only consider properties of other types. Our current 
implementation only supports hints for properties that translate into terminal automata. 
Therefore Table 2 has results only for a subset of the experiments shown in Table 1. 

The hint used in the two model checking experiments for the Soap model is to prevent 
some of the processors from issuing requests. In the case of the property that fails, it is 
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Table 1. Comparing model checking approaches for various LTL properties. 



experiment 


procedure 


time (sec) 


EXs+ EYs 


memory (MB) 


peak BDD nodes 


Soap (140 latches) 


CTL 


580 


55+45 


635 


19.4M 


G{p^Fq) 


^EF EG fair 


639 


56+45 


644 


18.8M 




-lEG fair true 


9598 


311+45 


637 


19.3M 


Soap 


CTL 


42 


16+45 


146 


3.4M 


G{p^X{qRr)) 


^EFfair 


8 


0+13 


43 


0.8M 


(failing) 


^EF EG fair 


260 


17+53 


570 


15.4M 




-lEG fair true 


288 


32+53 


571 


15.4M 


Soap 


CTL 


29 


4+45 


99 


2.1M 


G(p^X(g Rr)) 


^EFfair 


78 


0+45 


300 


6.0M 


(passing) 


^EF EG fair 


80 


3+45 


233 


6.0M 




-lEG fair true 


77 


3+45 


233 


6.0M 


Gcd (45 latches) 


CTL 


1131 


19+0 


639 


21.2M 


G(p^XFq) 


^EF EG fair 


291 


19+0 


591 


15.0M 




-lEG fair true 


8831 


186+11 


659 


19.8M 


Pcell (61 latches) 


CTL 


3 


47+66 


23 


120k 


G(p ^ pU q) 


^EF EG fair 


4 


45+80 


25 


195k 




-lEG fair true 


56 


1252+80 


73 


973k 


Palu (99 latches) 


CTL 


2 


5+0 


23 


228k 


G{p^fq) 


^EF EG fair 


3 


6+0 


25 


285k 




-lEG fair true 


3 


12+0 


26 


394k 


Fpmult (60 latches) 


CTL 


0.2 


5+0 


14 


12.0k 


G(p^XXXg) 


^EFfair 


2.9 


0+6 


15 


53.9k 




^EF EG fair 


0.2 


6+0 


14 


13.6k 




-lEG fair true 


0.2 


10+0 


14 


13.7k 


NullModem (53 latches) 


CTL 


1143 


28800+388 


49 


387k 


GfpVfGq 


^EF EG fair 


1225 


28623+388 


42 


310k 




-lEG fair true 


797 


17250+388 


91 


1120k 



Table 2. Effects of guided search. The lines without hints are taken from Table 1. 



experiment 


procedure 


time (sec) 


EXs+ EYs 


memory (MB) 


peak BDD nodes 


Soap G (p ^ X (q R r)) 


no 


7.5 


0+13 


43 


773k 


(failing) 


yes 


1.0 


0+13 


17 


39k 


Soap G (p ^ X (g R r)) 


no 


78 


0 + 45 


300 


6.0M 


(passing) 


yes 


27 


0+60 


90 


2.0M 


Lpmult G (p ^ XXXq) 


no 


2.9 


0+6 


15 


53.9k 




yes 


1.9 


0+15 


15 


53.6k 
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indeed possible to generate a counterexample when requests from one processor only 
are enabled. Considerable speed-up was obtained with a generic hint not specifically 
targeted at the property. Conversely, the hint that is optimal for the property that fails 
(only one processor is enabled) is not optimal for the property that passes, but still 
improves runtime with respect to the standard algorithm. The time to devise the hint was 
small compared to the time required to formulate the properties. Still, larger test cases 
than those presented in Table 2 will be required to assess the practical impact of guided 
search in symbolic model checking. 



6 Conclusions and Future Work 

In this paper we have presented an efficient algorithm for BDD-based LTL model 
checking based on guided search and specialized decision procedures for classes of 
automata. Our algorithm improves on the standard approach in both theory and practice: 
The selection of the most appropriate decision procedure for an LTL property decreases 
the asymptotic complexity of the model checking algorithm from quadratic in the size 
of the state space to linear in many cases, while our preliminary experimental results 
show that both classification of the automata and the use of hints can have large impacts 
on the runtime and memory requirements. 

Considerable work remains to be done besides the completion of the experimental 
evaluation of our algorithm. We outline a few of the issues that we plan to explore. 

Given an algorithm for CTL model checking with fairness constraints and a tableau 
construction procedure, a model checker for CTL* is readily available. Also, given the 
ability to compute least and greatest fixpoints, a //-calculus model checker can be built. 
Therefore, our guided search approach can be applied to the logics most commonly used 
in model checking. 

We have seen that LTL model checking may be faster than CTL model checking 
for the same property, because of the reduction in the number and alternation depth of 
fixpoints. The opposite may also occur, due to the additional state variables brought by 
the automaton that may increase the sizes of the BDDs manipulated by the model checker. 
Direct translation to //-calculus [17] should therefore be considered as an alternative to 
composition with the automaton. A better understanding of the relation between different 
tableau construction procedures is also desirable. 

The use of hints is not restricted to BDD-based approaches, but should apply to 
SAT-based model checking as well [2]. Linally, partial automation of the extraction of 
hints from the model and the property appears as a worthwhile research goal. 
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Abstract. Temporal logic and (^-automata are two of the common fra- 
meworks for specifying properties of reactive systems in modern verifi- 
cation tools. In this paper we unify these two frameworks in the linear 
time setting for the specification of stutter-invariant properties, which 
are used in the context of partial-order verification. 

We will observe a simple variant of linear time propositional temporal 
logic (LTL) for expressing exactly the stutter-invariant u;-regular lan- 
guages. The complexity of, and algorithms for, all the relevant decision 
procedures for this logic remain essentially the same as with ordinary 
LTL. In particular, satisfiability remains PSPACE-complete and tempo- 
ral formulas can be converted to at most exponential sized (^-automata. 
More importantly, we show that the improved practical algorithms for 
conversion of LTL formulas to automata, used in model-checking tools 
such as SPIN, which typically produce much smaller than worst-case ou- 
tput, can be modified to incorporate this extension to LTL with the same 
benefits. In this way, the specification mechanism in temporal logic-based 
tools that employ partial-order reduction can be extended to incorporate 
all stutter-invariant (^-regular properties. 



1 Introduction 

Today, cj- automat a and cj-regular languages are used in verification tools to both 
describe the models of reactive systems as well as to specify the properties being 
verified. While cj-automata typically form the basis for describing the system, a 
popular alternative for specifying properties is temporal logic [16], which offers an 
intuitive language for describing relationships between the occurrence of events 
over time. Linear time temporal logic, which we deal with in this paper, is used 
as a specification language in tools including SPIN [6], as well as more recent 
versions of SMV [12]. Standard linear time propositional temporal logic (LTL) 
can only express a strict subset of the cj-regular properties. To correct this, 
various remedies have been proposed (see [23]). 

Stutter- in variant languages ([9]) are used in the context of partial-order veri- 
fication [21,5,7,14]. In order to take full advantage of partial-order reduction, the 
properties that are specified need to be stutter- invariant (formal definitions will 
be supplied later). One way to assure that only stutter- in variant properties are 
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specified is to restrict linear temporal logic, LTL, to only those formulas without 
the “next” operator. This is precisely the approach currently employed in the 
tool SPIN [6], which exploits partial-order reduction, and uses “next” -free LTL 
to specify properties. 

LTL without “next” has been shown to accept all stutter-invariant LTL ex- 
pressible properties [15]. However, the fact remains that LTL can not express 
all cj-regular properties, and similarly, LTL without “next” can not specify all 
stutter- invariant cj-regular properties. 

In this paper we provide a simple variant of LTL for expressing all the stutter- 
invariant cj-regular languages, based on an equally simple temporal logic for all 
cj-regular languages. Significantly, the complexity of the relevant decision proce- 
dures remain the same as with ordinary LTL. In particular, satisfiability remains 
PSPACE-complete, as does model checking for negated formulas. Equally impor- 
tant, a practical algorithm for converting LTL formulas to automata ([4]), used 
in the tool SPIN [6], which typically produces much smaller than worst-case 
output, can be converted to incorporate this extension to LTL with basically 
the same benefits. As described in the conclusion, a preliminary version of this 
translation has been implemented with satisfactory results. 

The logics we will describe are variants of Existential Quantified Linear Pro- 
positional Temporal Logic (EQLTL). Wolper [23] considered extensions of LTL 
with operators based on automata and right-linear grammars. He showed that 
this extended logic defines exactly the cj- regular languages and has the desired 
complexity. However, the syntax of a logic augmented with grammars is cum- 
bersome and more akin to automata specifications. Sistla, Vardi, and Wolper, 
[19], considered variants of Wopler’s logic as well as Quantified Propositional 
Linear Temporal Logic (QLTL), where they provided complexity bounds for the 
satisfiability problem for formulas in the quantifier alternation hierarchy, inclu- 
ding EQLTL. They showed, as part of this hierarchy, that QLTL satisfiability 
has non-element ary complexity and EQLTL satisfiability is PSPACE-complete. 
Kupferman, [8], studied branching time temporal logics with existentially quan- 
tified propositions. 

It follows from Biichi’s original theorem ([1]) that EQLTL already captures 
the cj-regular languages, and hence also all of QLTL. The properties of EQLTL 
as a logic, in particular the complexity of decision procedures for the logic as 
well as its simple syntactic normal forms, make it worthy of closer examination. 

The main subject of this paper is a stutter-invariant \ev^\OTi of EQLTL, which 
we call SI- EQLTL. We will show that SI- EQLTL expresses precisely the stut- 
ter-invariant cj- regular languages. In proving this, we will also provide a simple 
normal form for cj- automat a that accept stutter- in variant languages. We will 
then describe an efficient translation algorithm from the logic into automata. 

Eormal definitions of all the mentioned notions are provided in the next 
section. Section 3 overviews EQLTL and its correspondence to the cj-regulars, 
preparing the way for section 4, where SI-EQLTL and stutter-invariant languages 
are considered. Section 5 covers the improved translation algorithm to automata, 
and in section 6 we conclude and describe an implementation of this translation. 
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Note: After publication of this paper as a technical report ([3]), I have been 
informed by D. Peled that in [17] A. Rabinovich has independently obtained a 
characterization of the stutter- in variant cj-regular languages in terms of Lam- 
port’s Temporal Logic of Actions ([10]). Although TLA semantics generally differ 
substantially from LTL, his results appear to amount to the fact that SI-QLTL 
captures the stutter-invariant languages. But the complexity of the critical deci- 
sion procedures for SI-QLTL remain non-element ary because of nested negation, 
and entire the reason we focus on SI-EQLTL is because of the complexity of these 
procedures. 

2 Definitions and Background 

Let denote the set of LTL formulas over propositional symbols F = 

{pi, . . . ,Pr}- These are defined according to the following inductive rules: 

- Pi £ F^xl each pi G P- 

- tp, 0(f, Q(f, (fUtp e for <f,tp ^ ^LTL- 

We extend LTL by allowing quantification over new propositions 
We define both the existential and universal fragment of Quantified (proposi- 
tional) Linear Temporal Logic (QLTL). Formulas in L^qj^xl^ ^aqltl^ 
defined, respectively, by the following additional rule: 

- 3^1 . .3qk‘f € ^eqltl, for V € ken. 

- Vgi . . .yqk(f e for '-P e ken. 

We define the semantics of EQLTL and AQLTL, over cj- words and over 
Kripke structures. Let our alphabet be Sp = 2^. An cj-word w = W 0 W 1 W 2 . . - G 
Up is a sequence of characters Wi G where i ranges over N = {0, 1,2, . . .}. 
Since we allow quantification over new propositional variables, we will also allow 
enlargement of our alphabet. Given such that P ^ and given a character 

a G PpA we define a|p = {pi G P | G a}. Let w and v/ be cj- words over 

the alphabets Up and A'pg respectively. We say that v/ is an extension of w iff 
Wi = w^-\p for all i G N. 

Given a word w = wqWiW 2 . . and given a position i G N, we let {wfi) \= p 
denote the fact that p is true on w at position i, defined inductively as usual, 
with the following semantics for propositional quantification: 

1. {wfi) \= 3qp if there is an extension v/ G (A'pu{g})^ of w such that 

(tc , i) 1= (p- 

2. {wfi) 1= Vgp if for all extensions w' G (Apu{g})^ of tc, {w' fi) |= p. 

A language over A is a set L C A^. A formula p is said to express the language 
L(p) = {vj I (tc,0) 1= p}. We let EQLTL stand for the languages [jp {L{fi) \ 
G t^EQLTh}^ ^^d we assume analogous definitions for the other logics. 

We will define a variant of EQLTL that captures precisely the stutter- inva- 
riant cj-regular languages. Before we define what stutter- invariance means, let 
us define the logic. A word tcMs a harmonious extension of the word tc, if 




Stutter-Invariant Languages, (^-Automata, and Temporal Logic 239 



is an extension of w such that, for all i G N, if = Wi^i then w'- = We 

define a restricted quantifier which differs from ordinary quantification in 

the following way: 

1. (tc,i) 1= 3^q(f iff there is a harmonious extension G (Apu{g})^ of w such 

that i) 1= (/?. 

2. 1= M^qLp if for all harmonious extensions G (i7pu{g})^ of 

(w',i) 1= Lp. 

We dehne r- inrariani EQLTL (AQLTL), denoted SI-EQLTL (SI-AQLTL), 
by replacing existential (universal) quantification of propositions with the harmo- 
nious quantification defined by 3^ (V^), and by disallowing the use of the Q ope- 
rator in formulas. Thus, SI-EQLTL formulas are those of the form: 3^gi, . . . , qk^ 
where cp is a 0"froe LTL formula. 

A Kripke structure K = [S^R^n) over the alphabet 2^ is a set S of states, 
together with a transition relation A! C Ax A, and a labeling function n \ S ^ 2^ , 
Let 7T = sqSi, ... G be a sequence of states of 1C. Extending the definition of 
n to sequences, the sequence tv defines an cj-word k{tv) = k(so)a^(si) . . . G A'p. 
Given an initial state Sinu^ we will say that a sequence tt is a proper sequence 
with respect to (/C, Sinu) if sq = Sinu and (sp G A, for all i. We now define 
what it means for a formula cp to be satisfied by a Kripke structure, given an 
initial state Sinu: 

- {IV, Sinit) 1= if for every proper sequence tt of (/C, Sinu), (k(’t),0) |= <p>. 

We now briefly recall the terminology for cj-regular languages and cj-automata. 
A Buchi automaton is A = (Q, 27, F, Q start): where Q is a set of states, U an 
alphabet, 2CQx27xQisa transition relation, A' C Q a set of final states, 
and Q start G Q is a set of start states. Given a word tc, a run r of A on tc is a 
sequence of states - ^ such that qo G Q start: Qi E Q, and {qi^Wi^ F+i) E 2, 

for all i. Let m/(r) denote the set of states that occur infinitely often in r. A is 
said to accept an cj-word w from the alphabet 27 if there is a run r of A on tc, 
such that there is some state q E F which occurs infinitely often on this run, 
i.e. F n m/(r) ^ 0. This is called the Buchi acceptance condition. Let L(A) be 
the the set of cj- words accepted by A. The cj-regular languages are the class of 
languages {L(A)| A a Biichi automaton }. 

Another acceptance criterion dehnes a Muller autowMion A = (Q, 27, 2, F, Qs^art)- 
Here, instead of one set F of final states, we are given a collection F C 2^, 
and the Muller acceptance condition states that there exists a run r of A on 
w such that m/(r) G F. It is easy to convert Biichi acceptance to Muller ac- 
ceptance, and, conversely, it is a well known theorem of McNaughton [13] that 
Muller automata accept precisely the cj-regular languages. 

Given a word w — wqWi . . ., and given function f :N N+ from the natural 
numbers to the positive natural numbers, let: 

w[f] = . . . 
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Here, is shorthand for the concatenation of n copies of the character Wi, A 
language L is stutter- invariant^ if, for any cj- words w = wqWi . . and 

for any / 

w £ L 4=^ w[f] £ L 

An cj-word w is stutter-free if for alH > 0, 7 ^ 'Wi^i or Wi = Wj for all j >i. 

Proposition 1. Let L and U he stutter- invariant languages. Then U = L iff 
they contain exactly the same stutter- free words. 

The stutter- in variant cj-regular languages are a strictly larger class than the 
stutter-invariant LTL definable languages: 

Proposition 2. The stutter- invariant language^ over U = {a, 6}; 

L = {w I the substring ab occurs an odd number of times in w} 
is not LTL definable^ but is tv-regular. 

3 EQLTL and the u;-Regular Languages 

It follows from the proof of Biichi’s theorem and known results about LTL, that 
EQLTL captures exactly the cj-regular languages. In this section we will look 
carefully at this correspondence. This will facilitate our proof in the next section 
that a variant of this logic captures the stutter- invariant c<;-regular languages. 

Theorem 1. (follows [1] & [18]. See also [19].) EQLTL defines exactly the tv- 
regular languages. 

Proof. 4=. Given an automaton A = {Q = {^ 1 , • • • , T'p, 4, F, s), we write an 
EQLTL formula that expresses L{A). This is just the easy direction of Biichi’s 
theorem: we “guess” an accepting run, and verify it, using temporal logic instead 
of first-order logic. We modify things slightly to facilitate the stutter- invariant 
version of this translation. 



(f=3qi,...,qk{a{f\^{qiAqj)) A ( 1 ) 

( V ( A ^ A ^ ^ 

I start! 

o( V P' o{ f\ p<^ P' A ~"p^) P' ^ 4) 

{qi,a,qj)^5 pi^a 

V ( 4 ) 

qjEF 



^ This definition differs slightly from previous definitions in the literature, e.g., in [15], 
but is equivalent, and perhaps somewhat simpler conceptually because we avoid 
introducing the notion of stutter equivalence in order to get at the notion of stutter 
invariance. 
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Given an EQLTL formula ip = 3gi, . . . , qki^j let be a Biichi automaton 
for 'ip (see [18,22,2]). We construct with the same set of states, by projection 
from A^ : for any edge (Q, a, Q') e Sa^ , where a = {pi^ , . . . , , . . . , J, put 

the edge (Q, Q') in where = {p^ , . . . ,Pv}' O 

Corollary 1. (see [20]) Every EQLTL formula p is equivalent to one in the 
following normal form (note only one quantified proposition): 

= 3q 'ip 

where 'ip is an LTL formula without the ^UntiF operator U, 

Moreover^ there is a polynomial p^ such that for an automaton A, there is a 
normal form formula pA^ L[pa) = L[A)^ such that \pa\ —p{\^\)' 

The proof is an adaptation of [20]. We can readily eliminate the U operator 
because the formula p in the proof of Theorem 1 contains none. Next we observe 
the computation complexity of the associated decision procedures. 

Corollary 2. 

1. ([19]) The satisfiability problem for EQLTL is PS PACE- complete. 

2. An EQLTL formula p can be translated to an equivalent Buchi automaton 

of size in time proportional to the size of the output. 

3. Model checking^ given a Kripke structure K and the negation of an EQLTL 
formula^ or given an AQLTL formula^ is PS PACE- complete. 

4 Stutter-Invariant o;-Regular Languages and SI-EQLTL 

Now we prove a result analogous to Theorem 1 for SI-EQLTL. Eirst, we provide 
a normal form for automata that accept a stutter-invariant language. 

Definition 1. A Muller automaton A^ = (Q^ ^ stutter-in- 

variant (SI) automaton if every state is reachable from q^iart^ 

following syntactic properties are satisfied^ for each state q of A q 4starC 

1. All incoming edges to q are labeled with the same character^ a^. 

2. (g, Oq^q) e and (g, 6, q) ^ 6^ for b ^ a^. 

3. \q,aq,q') C for q' if q. 

f. Moreover^ the exceptional start state fistart incoming edges. 

A cleaner, equivalent, way to view such automata is as Kripke structures with 
(Muller) acceptance conditions, and a given set of start states. Viewed this way, 
an SI automaton then amounts to a Kripke structure with the following two 
additional properties: 

1. Every state s has a self loop, i.e., V s G S] Rfs^s). 

2. V s 7^ V G N, if R{s^ s') then n{s) n{s'). 



Proposition 3. An SI automaton accepts a stutter- invariant language. 
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Lemma 1. If L{A) is stutter- invariant then L{A) = L{A')^ where A' is a SI 
automaton. Moreover^ we can pick A^ such that \A^\ < 0(|^| x 

Proof. We define A' to mimic A^ except that a state of A' remembers the last- 
seen character. In addition, we need some extra surgery in order not to allow 
both arrival and departure from a state using the same character. 

Given A = {Q, 6,Qstart, F), we define A' = {{Q x F) U {q^tart} ^ I 

1. ((^, 0 .), 6, (g^, c)) E 6^ lA [b = c ^ a and {q^b^q^) E 6) or [b = c = a and 

q = q') 

2. ((^, 0 .), 6, E 6^ lA [b = c ^ a and starting at state q m A there exists 

an accepting run on the word b^). 

3. 6, G iff a = 6 = c. 

4- Wstart^^y {Q 2 :b)) e lA a = b and (gi,a,g 2 ) ^ 4 for some qi e Qstart- 

Now, the accepting sets are given by: F Q x > 1 A 3(g, a) G 

F' s.t. g G N} U {{(g, a)} | g G N A (g, a, g) G U {{qT"^} | a G N}. 

By inspection, A^ is an SI automaton. By Proposition 3 and Proposition 1 
we need only show that L[A) and L{A^) contain the same stutter- free words. 

Claim. For any stutter- free word tc, there is an accepting run r of ^ on tc if and 
only if there is an accepting run F of A^ on w. 

We must omit the proof of the claim, which splits things into two cases: 

either (1) w = wqWi^ . . or (2) w = wqWi^ . . ., where there are never two 

consecutive occurrences of the same character. The claim concludes the lemma, 
as A^ satisfies all the required conditions. □ 

Theorem 2. SI-EQLTL defines exactly the stutter- invariant tv-regular langua- 
ges. 

Proof. C: First, a SI-EQLTL formula F = d^gi, . . . , g/^cp can only express a 
stutter- in variant language. To see this, consider w and w[f] for any f :N N+. 

Claim, w G L{fj) if only if w[f] G T(Q). 

Proof. Suppose w G L{F): then there is a harmonious extension v of w 
such that V G L{tp). But the LTL formula, accepts a stutter-invari- 

ant language ([15]), and thus since v[f] is a harmonious extension of tc[/], and 
v[f] G T((p), it follows that w[f] G T(Q). 

4=. Suppose w[f] G Thus there is a harmonious extension v[f] of w[f] 

such that v[f] G T(cp). But, since L{tp) is stutter- in variant, it must again be 
the case that i; is a harmonious extension of w such that v G T(cp), and hence 
w G □ 

Now, to see that T(Q) is cj-regular: Let Aguess = {Q guess = U 

{qstart}, dguess, {qstart} , Qguess) be an automaton which has a transition 5guess{q, 
a, g^) if and only if 
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L q' = a, (note: states, as well as transition labels, are denoted by sets of 
propositions). 

2. If a\p = q\p then a = q. (this insures the harmonious nature of the guess). 

Let the automaton for ip be derived from the tableau construction ([18, 
22]). We construct the automaton A-p C A^^ x Aguess^ where the states are 
Qy = {{gj) \ g ^ a f e Qguess}, where moreover g and / are con- 
sistent^ meaning that, for qi G c/(c/p), qi ^ g qi G /. The transition 

relation S{{g, f), a, {g\ f^)) holds iff 6g,{g,a,g^) and 6guess{f, a, the start sta- 
tes Qpstart = {{g.f) \ ^ ^ g}, and F^ = F^x Fguess- It can be verified that the 
automaton A.p accepts L(t/;), with Ag^ess basically used to insure that we only 
“guess” harmonious extensions. 

T. Given T(A), a stutter- in variant language, our objective is to write an SI- 
EQLTL formula expressing the language L[A). By Lemma 1, we can assume 
that A is a stutter-invariant automaton. We will use the fact ([15]) that 
LTL captures precisely the stutter-invariant subset of LTL. 

Consider an EQLTL formula t/; = 3gi , . . . ^qk4> expressing the language L(A). 
The crucial point is that because A has the syntactic normal form, there exists 
an accepting run r of A on tc iff there exists a harmonious extension of 
such that is satisfied by the LTL formula. Thus, it suffices to convert the 
quantification to . . . , 

It only remains to remove the Q operators from the expression. This can be 
done by extending the proof of [15]. Let SI-EQLTL(O) bo the logic where we 
do allow the (Q) operator to occur in the LTL part of the formula. 

Claim, Any SI-EQLTL(O) formula, t/;, that accepts a stutter- in variant language 
can be converted to an SI-EQLTL formula such that L{'ip) = 

The proof is as [15]. The only new observation needed is that any harmonious 
extension of a stutter-free word is also stutter-free. That concludes the theorem. 

□ 

To prove a normal form result for SI-EQLTL analogous to EQLTL, which 
eliminates the use of the binary U operator, we will need the following stutter- 
invariant version of the Q operator, Q*? which intuitively means “at the next 
distinct character”. Eormally, let be defined as follows: 

- {w,i) 1= 0*(f) if (3j > iwj ^Wi AV i', i < i' < jw., = Wi^) {w,j) |= ({>. 

Let SI-EQLTL(0*; W) denote the variant of SI-EQLTL where the U operator 
is disallowed, but Q* allowed. It can be shown that 

Corollary 3. SFEQLTL expresses the same languages as SFEQLTL[Qf ^ 1/i), 

We are unable to provide a normal form where only one existential quantifica- 
tion is necessary, because Thomas’s elimination argument [20] doesn’t work in 
the stutter-invariant setting. It will be interesting to establish whether such a 
normal form exists. Einally, we address the costs and complexities involved in 
the mentioned translations and results above. They are, as one would expect, 
basically the same as for EQLTL. 
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Corollsiry 4. 

1. The satisfiahility problem for SI-EQLTL is PS PACE- complete, 

2. An SEEQLTL formula ip can be translated to an equivalent Buchi automMon 

of size in time proportional to the size of the output, 

3. Model- checking^ given a Kripke structure K and the negation of an SE 
EQLTL formula^ or given an SI-AQLTL formula^ is PS PACE- complete, 

5 Improved Algorithm for SI-EQLTL Conversion to 
Automata 

In translating from LTL formulas to automata, a naive implementation of the 
tableau construction always incurs the worst-case exponential blow-up. In [4] an 
algorithm is provided which in practice behaves much better. A version of this 
algorithm for (ff-free LTL has been implemented in the SPIN tool. 

For our purposes, it is important to know that, rather than a standard Biichi 
automaton A = {Q^S^tF^s) over Up = 2^, the automaton produced by the 
algorithm of [4] actually has the following special form: for every state q G Q, 
there is a unique term (a conjunction of literals) from P^ such that every “edge” 
from another state to q has the label this term is a symbolic shorthand for 
all the characters consistent with it, meaning the actual edge {qfa^q) exists iff 
a is consistent with the term a^. In practice, this shorthand can be much more 
concise than the ordinary notation for automata. Later, we will need another 
important fact about the output of the [4] algorithm, namely, a monotonicity 
which it preserves. 

We now describe how to modify the [4] algorithm to work for translating from 
both EQLTL and SI-EQLTL to automata. For EQLTL, modifying the algorithm 
is trivial. The only observation necessary is that existential quantification corre- 
sponds to projection, even on term-labeled edges: 

Proposition 4. Civen an EQLTL formula p = . . ,qk Qy and given a special 

form automaton Ap^ such that L{Q) = the special form automaton A^^ 

derived from Ay by removing all literals over {gi, . . . , from the terms labeling 
the edges of A.p^ defines precisely the language defined byp^ i,e,^ 

Note that the automaton generated for . . . qkQ is never bigger that the 
one generated for Q. The case of SI-EQLTL is more interesting and complicated. 
In particular, it is not in general possible to obtain an automaton for 3^ti , , ,tk'f 
which is no bigger than the automaton for produced by the [4] algorithm. 

The following theorem gives an algorithm that incurs exponential blow-up 
in terms of k and quadratic blow-up in terms of the number of unquantified 
propositions in Q. We then give a modification of this algorithm which in practice 
behaves much better. 

Theorem 3. Civen a formula p = 3Mi . . . where Q is a formula over the 
propositions • • • Na;; there is a special form, automaton A^^^ such that 

L{A,^) = L{(p)^ and such that 

\A^\ < 0(2* X X 0^1) 
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where is a special form automaton for the formula (as produced hy the [f] 
algorithm). 

Proof, Let T = {ti, . . . , Given the set F = {pi, . . . of propositions, let 
-iP denote the set {->pi ^ . . . ^~^Pr} of negations of these propositions. Given a 
special form automaton = {Q^ q^tart) for t/;, we would like to obtain an 
automaton for p. 

We now define an automaton A' which will be central to the definition of 
A(^. 



V = (Q' C (Q X X {FU^Ff)U{s,tart},S',n,S,tart) 

The states of A\ other than the start state s starts consist of those triplets 
(g, 6^,r), where g G Q is a state of 6^ is a subset of T and defines a /w// valuation 
of all the variables in d\ and r specifies two elements of the set F U -iP, where 
these two elements are consistent with each other, i.e., it is not the case that 
one is the negation of the other. Furthermore, for (g,6^,v) to qualify as a state 
in it must also satisfy the following extra condition: we view r as specifying 
a partial valuation of P, namely, the two literals specified in r must hold. Now, 
the extra condition that must be satisfied by (g, 6^, r) is that the unique term 
which labels all edges in A^ which enter the state g, must be consistent with 
the valuation (full on T and partial on P) specified by 0 and r. A^ will also be a 
special form automaton, in that all edges into a state s = (g, 6^, t) will be labeled 
with the same term ag. 

The transition = (g^, 6^^, P), (jg, s = (gP,r)) will exist if and only if all 
of the following conditions hold: 

1. (Js is consistent with the valuation defined by 6^, and the partial valuation 
on the pair of pj^s defined by r. 

2. There is a transition S[q\aq^q) in A^, such that the term is a suhterm of 
the term (jg, meaning that every literal of is also a literal of (jg. 

3. If the valuations 0' and 0 on the t^s are inconsistent, then the partial valua- 
tions defined by F and r must also be inconsistent. 

The transitions out of the exceptional start state s start £^re defined as follows 
d^Sstart^o's^ {qF:^)) holds if and only if the first two condition above hold, with 
(1st art substituted for g^ in the statement of the second condition. Next we define 
the acceptance condition in AL For each set F = {g^^ , . . . , g^^} G P, we put 
the set P^ = {s = (g^^. , 6^, r) I g G {1, . . . , r} A s G in Ph 

From the automaton A^ we will now obtain the automaton A^^, by “projecting 
out” the gFs as we normally would for regular existential quantification. More 
formally: for each transition s) of A^ the term is replaced in A^^ by 

a term with all the literals over the gFs removed, obtaining the transition 
s). It remains to show that A^^ defines the language we are after. 

Lemma 2. Given and given A^^ obtained from p via the above 

construction: 



L(A^) — L{(f) 
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We have to omit the proof, which is a bit technical, but the reason we need 
only remember two literals from {F U ->F) is that we can “guess” the literals 
over F which will distinguish distinct consecutive characters, and we need two 
literals rather than one per character because we use one to distinguish it from 
its predecessor and the other to distinguish it from its successor. This concludes 
our theorem. □ 

The construction of was the crucial step in Theorem 3, and it is there 
that a 2^ X blow-up occurs. As it stands, always suffers from this blow-up. 
However, with an important observation, we can modify the algorithm so that 
the blow-up in k need not be worst-case. 

In going from to A^, for every state q of A^, and every r, we built the 
states {q^O^r) for every full valuation 0 of T. The observation is that these 6^’s 
need not be full valuations of T. In fact, the only condition we need is that, 
for any run of A^, the domains of the sequence of expansions 6^^’s in our new 
automaton form a monotonic (non- increasing) sequence of partial valuations of 

by which we mean that the domains form a monotonic sequence of sets. 
This assures us that, if there is any accepting run in the resulting automaton, 
then an accepting run can be constructed where the “guessed” variables form 
a harmonious extension. The reason, intuitively, is that we can safely extend a 
partial valuation of T that is consistent with a predecessor to fully match this 
predecessor valuation, without forcing an inconsistency with future valuations, 
because future valuations can evaluate at most the same or fewer variables and 
can thus be extended likewise if they are consistent with their predecessor. 

But how are we to come up with such a monotonically decreasing sequence 
of partial- valuations to satisfy the condition? It turns out that we are in a rather 
fortuitous situation. The output of [4]’s algorithm already provides us with such 
a sequence. Each state of that output is marked by a set of subformulas of the 
original temporal logic formula being translated, and here is where we find our 
monotonic sequence: on any path in the automaton the set of tfs that occur in 
the set of subformulas in each node form a monotonic (non-increasing) sequence. 
We can thus simply use these sets as our monotonic sequence directly from the 
[4] algorithm. Using this observation, we can reduce the state space of A' by 
only evaluating the t[s that need to be evaluated for each given state q of A^. 
We have to omit details. 

6 Conclusions 

We have provided a simple temporal logic, SI-EQLTL, for expressing the stutter- 
invariant cj-regular languages. We have shown that the basic algorithms for, e.g., 
satisfiability and conversion to automata, for LTL, can be modified to the set- 
ting of SI-EQLTL without any substantial penalty in computational complexity. 
Along the way, we have defined stutter-invariant automata, a syntactic normal 
form for automata which captures the stutter- in variant cj-regular languages. 

The purpose of such a logic is to close the gap, in a natural way, between 
systems like CO SPAN where properties are specified as cj- automat a, and those 
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such as SPIN, where properties are specified in the weaker LTL formalism, and 
where, moreover, only stutter-invariant properties are allowed in order to enable 
partial-order verification. 

One potential criticism for both SI-EQLTL and EQLTL is that, although 
both logics are semantically closed under complementation, they are not syntac- 
tically ^o. Indeed, complementation of a formula is, in the worst case, costly: in- 
curring an exponential blow-up. As a result, in order to perform model checking 
with the same complexity as LTL, we need to work with the negation of SI- 
EQLTL formulas or with SI-AQLTL formulas. Although this is undesirable, it 
is a situation no different than the behavior of non-deterministic cj- automat a, 
for which complementation is similarly costly. This was the reason behind [11] ’s 
advocacy of V- automat a. In other words, in either formalism one has to deal 
with the cost of complementation, but the benefits of a more succinct logical re- 
presentation make these temporal logics an attractive alternative to automata. 

We have implemented the translation algorithm of section 5 in the program- 
ming language ML, extending an implementation due to Doron Peled of the [4] 
algorithm. Preliminary experiments indicate that the translation produces very 
reasonable sized automata. The intention has been to ultimately incorporate the 
extended logic, using the translation, in Gerard Holzmann’s tool SPIN in order 
to supply it with the extra expressive power. 
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Abstract. We improve the state-of-the-art algorithm for obtaining an 
automaton from a linear temporal logic formula. The automaton is inten- 
ded to be used for model checking, as well as for satisfiability checking. 
Therefore, the algorithm is mainly concerned with keeping the automa- 
ton as small as possible. The experimental results show that our algo- 
rithm outperforms the previous one, with respect to both the size of the 
generated automata and computation time. The testing is performed 
following a newly developed methodology based on the use of randomly 
generated formulas. 



1 Introduction 

This paper focuses on the explicit-state automat a-based approach to model 
checking of linear temporal logic specifications [VW86,VW94,Hol97]. In this ap- 
proach, both the system and the negation of the specifications are turned into au- 
tomata on infinite words [Tho90]. The former automaton recognizes the system 
execution sequences, while the latter one comprises all the execution sequen- 
ces (models) violating the specifications. Verification amounts then to checking 
whether the language recognized by the synchronous product of the above au- 
tomata is empty. Similarly, satisfiability checking amounts to checking that the 
language recognized by the automaton built for the formula to be checked is non- 
empty. Satisfiability also plays an important role in model checking, for avoiding 
model checking unsatisfiable or valid specifications. 

The automaton for the specifications can have as many as 2^^^^ states, where 
n is the number of subformulas of the specifications [VW94]. Therefore, the size 

Supported in part by NSF grants CCR-9628400 and CCR-9700061, and by a grant 
from the Intel Corporation. Part of this work was done while the first author was a 
visiting student and the third author was a Varon Visiting Professor at the Weizmann 
Institute of Science. 



N. Halbwachs and D. Peled (Eds.): CAV’99, LNCS 1633, pp. 249-260, 1999. 
@ Springer- Verlag Berlin Heidelberg 1999 




250 



M. Daniele, F. Giunchiglia, and M.Y. Vardi 



of the product automaton, which determines the overall complexity of the me- 
thod, is proportional to N • where N is the number of reachable system 

states. For these reasons, it is clearly desirable to keep the specification automa- 
ton as small as possible, and to work on-the-fly, that is, to detect that a system 
violates its specifications by constructing and visiting only some part of the se- 
arch space containing the bug. Note that even though in practice the assertions 
being verified are typically expressed by short formulas, it is often impossible to 
verify these properties without making some assumptions on the environment 
of the system being verified. Thus, in practice, one typically model checks for- 
mulas of the form ^ ^ 7 /;, where the assertion 7 /; may be quite simple, but the 
assumption (j) may be rather complicated. 

The state-of-the-art on-the-fly algorithms for turning specifications into au- 
tomata and performing the emptiness check can be found in [CVWY91] and 
[GPVW95]. Such algorithms define the kernel of the model checker SPIN [Hol97]. 
We refer to the algorithm described in [GPVW95] as GPVW. That paper also 
discusses several possible improvements. We refer to the improved algorithm 
as GPVW+. An alternative automata construction for temporal specifications 
[KMMP93] starts with a two-state automaton that is repeatedly “refined” until 
all models of the specifications are realized. Due to this refinement process, ho- 
wever, this algorithm can not be used in an on-the-fly fashion. Another approach 
could be turning the on-the-fly decision procedure presented in [Sch98] into a 
procedure for automata construction. It is not clear, however, whether and how 
this modification could be done, for that procedure is geared towards finding 
and representing one model, but not all models. 

In this paper we present, and describe experiments with, an algorithm for 
building an automaton from a linear temporal logic formula. Our algorithm, 
hereafter LTL2AUT, though being based on GPVW+, is geared towards building 
smaller automata in less time. Our improvements are based on simple syntac- 
tic techniques, carried out on-the-fly when states are processed, that allow us 
to eliminate the need of storing some information. Experimental results demon- 
strate that GPVW+ significantly outperforms GPVW and show that LTL2AUT 
further outperforms GPVW+, with respect to both the size of the generated au- 
tomata and computation time. The testing has been performed following a ne- 
wly developed methodology, which, inspired by the methodologies proposed in 
[MSL92] and [GS96] for propositional and modal K logics, is based on randomly 
generated formulas. 

The rest of the paper is structured as follows. Section 2 introduces linear 
temporal logic and automata on infinite words. Section 3 presents the core un- 
derlying GPVW, GPVW+, and LTL2AUT, and Section 4 shows how GPVW, 
GPVW+, and LTL2AUT can be obtained by suitably instantiating such core. 
The test is divided between Section 5, where our test method is discussed, and 
Section 6, where a comparison of the three algorithms is given. Finally, we make 
some concluding remarks in Section 7. 
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2 Preliminaries 

The set of Linear Temporal Logic formulas (LTL) is defined inductively starting 
from a finite set V of propositions^ the standard Boolean operators, and the 
temporal operators X (“next time”) and U (“until”) as follows: 

— each member of 7^ is a formula, 

— /ii V /X 2 , /xi A //2 7 and piUp 2 are formulas, if so are /xi and /i 2 - 

An LTL interpretation is a function ^ : A ^ 2^, i.e., an infinite word over 
the alphabet 2^^, which maps each instant of time into the propositions holding 
at such instant. We write for denoting the interpretation At.^(t T 0* LTL 
semantics is then defined inductively as follows: 

- C\=piSpe ^(0), for peV, 

“ h “'Ml iff'? ^ Ml; 

“ ? 1= Ml M /i 2 iff ? 1= Ml or ^ 1= /i 2 , 

^ ? 1= Ml M M2 iff ? 1= Ml and ■? |= M2; 

- ? 1= Vmi iff 6 [= Ml; 

“ ^ H /^i iff there exists i > 0 such that \= /X 2 and, for all 0 < 3 <h 
H Ai- 

As usual, we have T=p\/ ->p and FA-,T. Moreover, we define /xiV/i 2 = 

“■(-■/xi ZY-I/X 2 ). This latter operator allows each formula to be turned into negation 
normal form^ that is, it allows the pushing of the -i operator inwards until it 
occurs only before propositions, without causing an exponential blow up in the 
size of the translated formula. From now on, each formula is considered to be in 
negation normal form. 

A literal is either a proposition or its negation, an elementary formula is either 
T, or F, or a literal, or an A -formula. A set of formulas is said to be elementary if 
all its formulas are. A non-element ary formula fi can be decomposed, according 
to the tableau rules of Figure 1, so that p ^ A/ 3 i£ai(^) Pi M /\p^ea 2 {iu.) 



g 


ai{g) 


02 (/x) 


gi Ap2 
gi V p2 
giUg2 
giVg2 


{M1;M2} 

{mi} 

{M2} 

{M2,Mi} 


{F} 

{M2} 

{pi,X (mi WM 2 )} 
{M2, ^(MiFM2)} 



Fig. 1. Tableau rules. 



Finally, a cover of a set A of formulas is a, possibly empty, set of sets C = 
{Gi-.ie 1} such that M ^ Vier ^rueG, Vi- 



We represent formulas via labeled generalized Buchi automata, A generalized 
Biichi automaton is a quadruple A = (Q,2,^, A), where Q is a finite set of 
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states^ X C Q is the set of initial states^ 6 : Q ^ 2^ is the transition function^ and 
tF C 2^^ is a, possibly empty, set of sets of accepting states tF = {Fi, F 2 , . . . , Fn}. 
An execution of A is an infinite sequence p = • • • such that qo ^ F and, for 

each i > 0, qi^i G S[qi). p is accepting execution if, for each Fi G F, there exists 
qi G Fi that appears infinitely often in p. A labeled generalized Buchi automaton 
is a triple (^, F,X), where ^ is a generalized Buchi automaton, V is some finite 
domain, and C : Q ^ 2^ is the labeling function. A labeled generalized Biichi 
automaton accepts a word f = x 0X1X2 . . . from iff there exists an accepting 
execution p = • • • of ^ such that Xi G for each i > 0. 

3 The Core 

LTL2AUT, GPVW+, and GPVW can be obtained by suitably instantiating the 
core we are about to present. The instantiation affects some functions that, in 
what follows, are highlighted through the SMALL CAPITAL font. The central part 
of the core is the computation of a cover of a set of formulas, which is used for 
generating states. The propositional information will be used for defining the 
labeling, while the X information will be used to define the transition function. 

3.1 Cover Computation 

The algorithm for computing covers is defined by extending the propositional 
tableau in order to allow it to deal with temporal operators. The fundamental 
rules used for decomposing temporal operators are the identity pUp = p V [p A 
X{pUp)) and its dual pVp = p A [pV X{pVp)). The line numbers in the fol- 
lowing description refer to the algorithm appearing in Figure 2. The algorithm 
handles the following data structures: 

To Cover The set of formulas to be covered but still not processed. 

Current The element of the cover currently being computed. 

Covered The formulas already processed and covered by Current 
Cover The cover so far computed. 

When computing the current element of the cover, the algorithm first checks 
whether all the formulas have been covered (line 4). If so. Current is ready to 
be added to Cover (line 5). If a formula p has still to be covered (line 6), the 
algorithm checks whether p has to be stored in the current element of the cover 
(line 8) and, if so, adds it to Current (line 9). Processing p can be avoided in 
two cases: If there is a contradiction involving it (line 10) or it is redundant 
(line 12). In the former case. Current is discarded (line 11), while in the latter 
one p is discarded (line 13). Finally, if p does need to be covered, it is covered 
according to its syntactic structure. If p is elementary, it is covered simply by 
itself (line 15). Otherwise, p is covered by covering, according to the tableau 
rules appearing in Figure 1, either ai[p) (line 16) or a 2 {p) (line 18). This is 
justified by recalling that /x A/?i£ai(^) Pi ^ P2- 
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1 function Cover (A) 

2 return cover (A, 0, 0, 0) 



3 function cover (To Cover ^ Current^ Covered^ Cover) 

4 if To Cover = 0 

5 then return Cover U { Current} 

6 else select from To Cover 

7 remove jj. from To Cover and add it to Covered 

8 if has_to_be_stored(/x) 

9 then Current = Current U {/j.} 

10 if CONTRADICTION (yU, ToCover, Current, Covered) 

11 then return Cover 

12 else if redundant (/x, To Cover, Current, Covered) 

13 then return cover( ToGot;er, Current, Covered, Cover) 

14 else if jj. is elementary 

15 then return cover( ToGot;er, CurrentU {/x}, Covered, Cover) 

16 else return cover (To CoverVJ (oi(/x) \ Current), 

17 Current, Covered, 

18 cover (To Cover U (02 (/x) \ Current), 

19 Current, Covered, Cover)) 



Fig. 2. Cover computation. 



3.2 The Automaton Construction 

Our goal is to build a labeled generalized Biichi automaton recognizing exactly all 
the models of a linear time temporal logic formula 7 /;. The algorithm is presented 
in two phases. First, we introduce the automaton structure, i.e., its states, which 
are obtained as covers, initial states, and transition function. The line numbers 
in the following description refer to this part of the algorithm, which appears in 
Figure 3. Then, we complete such structure by defining labeling and acceptance 
conditions. 

The algorithm starts by computing the initial states as cover of { 7 /;} (line 2). 
A set U of states whose transition function has still to be defined is kept. All 
the initial states are clearly added to U (line 2). When defining the transition 
function for the state s (line 4), we first compute its successors as cover of 
{/X : X/x G 5 } (line 5). For each computed successor r, the algorithm checks 
whether r has been previously generated as a state (line 6). If so, it suffices 
to add to 4(s) (line 7). Otherwise, r is added to Q and 4(s) (lines 8 and 9). 
Moreover, r is also added to G (line 10), for 6(r) to be eventually computed. 

The domain T) is 2^ and the label of a state s consists of all subsets of 2^ that 
are compatible with the propositional information contained in s. More in detail, 
let Fos(s) be sCV and Neg(s) he {p e V : £ s}. Then, L(s) = {X : X CV A 

Fos(s) C X AXnNeg(s) = 0}. Finally, we have to impose acceptance conditions. 
Indeed, our construction allows some executions inducing interpretations that are 
not models of 7 /;. This happens because it is possible to procrastinate forever the 
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1 procedure create_automaton_structure(i/;) 

2 = Cover({iA}), 4 = 0 

3 while G ^ 0 

4 remove s from U 

5 for r G Cover({/x : X/x G s}) 

6 if 3r^ G Q such that r = / 

7 then 4(s) = 4(s) U {Y} 

8 else Q = Q U {r} 

9 4(s) = 4(s) U {r} 

10 U = U\J{r} 



Fig. 3. The algorithm. 



fulfilling of G-formulas, and arises because the formula jiUrj can be covered by 
covering fi and by promising to fulfill it later by covering X[jiUr]). The problem 
is solved by imposing generalized Biichi acceptance conditions. Informally, for 
each subformula jilArj of t/;, we define a set Fj^uri ^ ^ containing states s that 
either do not promise it or immediately fulfill it. In this way, postponing forever 
fulfilling a promised Zi-formula gives not rise to accepting executions anymore. 
Formally, we set Ffj,u7]={s G Cover : SATISFY [ s ^ fiUrj) satisfy(s, 77)} where, 

again, SATISFY is a function that will be subject to instantiation. 

4 GPVW, GPVW+, and LTL2AUT 

GPVW is obtained by instantiating the Boolean functions parameterizing the 
previously described core in the following way. has_to_be_stored(/x) returns 
T. CONTRADICTION (/i, ToCover\ Current^ Covered) returns T iff /x is F or /x is 
a literal such that -i/x G Current redundant (/ x, ToCover\ Current^ Covered) 
returns F. satisey(s, / x) returns T iff /x G 5. 

For GPVW+ we have the following instantiations. has_to_be_stored(/x) 
returns T iff /x is a ZY- formula or /x is the right hand argument of a ZY- formula. 
contradiction(/x, ToCover\ Current^ Covered) returns T iff /x is F or the 
negation normal form of -i/x is in Covered, redundant (/x, ToCover\ Current^ 
Covered) returns T iff /x is rjUv and v G ToCover\J Current^ or fi is rfVn and 77, 
u G ToCover-yj Current satisey(s, /x) returns T iff /x G s. 

GPVW+ attempts to generate less states than GPVW by reducing the for- 
mulas to store in Current and by detecting redundancies and contradictions as 
soon as possible. Indeed, by reducing the formulas to store in Current^ GPVW+ 
increases the possibility of finding matching states, while early detection of cont- 
radictions and redundancies avoids producing the part of the automaton for de- 
aling with them. However, GPVW+ still does not solve some basic problems. 
First, states obtained by dealing with a Zi- formula contain either the Zi-formula 
or its righthand argument. So, for example, states generated for the righthand 
argument oi fiUrj are equivalent to, but do not match, prior existing states ge- 
nerated for 77. Second, redundancy and contradiction checks are performed by 
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explicitly looking for the source of redundancy or contradiction. So, for example, 
aZi-formula whose righthand argument is a conjunction is considered redundant 
if such conjunction appears among the covered formulas, but it is not if, instead 
of the conjunction, its conjuncts are present. 

LTL2AUT overcomes the above problems in a very simple way: Only the 
elementary formulas are stored in Current^ while information about the non- 
elementary ones is derived from the elementary ones and the ones stored in 
ToCover using quick syntactic techniques. More in detail, we inductively define 
the set SX[A) of the formulas syntactically implied by the set of formulas A as 
follows 

— Te SI{A), 

— /i G <SX(A), if /i G Ay 

— /i G SX[A)y if fi is non-element ary and either o;i(/i) C SX[A) or cx 2 {jX) C 
SX{A). 

LTL2AUT requires then the following settings. has_to_be_stored(/x) returns 
F. CONTRADICTION (/i, ToCover\ Currenty Covered) returns T iff the negation 
normal form of -i/x belongs to SX[ToCover\J Current). REDUNDANT(/i, ToCover-y 
Currenty Covered) returns T iff /x G SX[ToCover\J Current) and, if /x is rjUi/y 
jy G SX[ToCoverU Current). satisey(s, / x) returns T iff /x G SX[s). The special 
attention to the righthand arguments of Zi-formulas in the redundancy check is 
for avoiding discarding information required to define the acceptance conditions. 
The proof of correctness of LTL2AUT is described in [DGV99]. 



5 The Test Method 

The existent bibliography on problem sets and testing-generating methods for 
LTL and model checking is very poor. Indeed, papers usually come along with 
testing their results over, in the best cases, few instances. The method we have 
adopted is based on two analyses: 

Average-behavior analysis: For a fixed number N of propositional variables 
and for increasing values L of the length of the formulas, a problem set 
'P^{F,N,L) of F random formulas is generated and given in input to the pro- 
cedures to test. After the computation, a statistical analysis is performed and 
the results are plotted against L. The process can be repeated for different 
values of N . 

Temp oral- behavior analysis: For a fixed number N of propositional varia- 
bles, a fixed length L of the formulas, and for increasing values F of the 
probability of generating the temporal operators U and V, a problem set 
VS{^f,n,l,p) of F random formulas is generated and given in input to the 
procedures to test. After the computation, a statistical analysis is perfor- 
med and the results are plotted against P. The process can be repeated for 
different values of N and L. 
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When generating random formulas from a formula space, for example defined 
by the parameters TV, L, and our target is to cover such space as uniformly as 
possible. This requires that, when generating formulas of length L, we produce 
formulas of length exactly L, and not up to L. Indeed, in the latter way, varying 
L, we give preference to short formulas. Random formulas parameterized by 
yV, L, and F^ are then generated as follows. A unit-length random formula is 
generated by randomly choosing, according to uniform distribution, one variable. 
From now on, unless otherwise specified, randomly chosen stands for randomly 
chosen with uniform distribution. A random formula of length 2 is generated by 
generating op(p), where op is randomly chosen in {-i,A} and p is a randomly 
chosen variable. Otherwise, with probability y of choosing either U or V and 
probability of choosing A, A, or V, the operator op is randomly chosen. 
If op is unary, the random formula of length L is generated as op(/i), for some 
random formula p of length L — 1. Otherwise, if op is binary, for some randomly 
chosen 1 < S' < L — 2, two random formulas pi and p 2 of length S and L — S — 1 
are produced, and the random formula op(/xi, /X 2 ) of length L is generated. Since 
the set of operators we use is {-i. A, A, V,ZY, V}, random formulas for the average- 
behavior analysis are generated by setting F = Note that parentheses are not 
considered. Indeed, our definition generates a syntax tree that makes the priority 
between the operators clear. 

In both the above analyses, the parameters we are interested in are the size of 
the automata, namely states and transitions, and the time required for their ge- 
neration. When comparing two procedures iJi and II 2 with respect to some pro- 
blem set TV /j) and parameter 6^, we perform the following statistical ana- 

lysis. First, we compute the mean value of the outputs of IJi and II 2 separately, 
and then consider their ratio that, hereafter, is denoted by . 

A different statistical analysis of the data is described in [DGV99]. 

6 Results 

LTL2AUT, GPVW, and GPVW+ have been implemented on the top of the same 
kernel, and are accessible through command line options. The code consists of 
1400 lines of C plus 110 lines for a lex/yacc parser. The code has been compiled 
through gcc version 2 . 7 . 2 . 3 and executed under the SunOS 5.5.1 operating 
system on a SUNW UltraSPARC-II/296 IG. 

LTL2AUT and GPVW+ have been compared, according to the test me- 
thod discussed in Section 5, on 5700 randomly generated formulas. The re- 
sults are shown in Figure 4. For the average behavior analysis, LTL2AUT and 
GPVW+ have been compared on 3300 random formulas generated, according 
to our test method, for F = 100, N = 1, 2, 3, and L = 5, 10, ... , 55. Formulas 
have been collected in 3 groups, for N = 1,2,3, and inside each group par- 
titioned into 11 problem sets of 100 formulas each, for L = 5, 10, . . . ,55. For 

1 £;(LTL 2 AUT, states , vs ^100, N,L}) £;(LTL 2 AUT, transitions ,F 5 (ioo,jv,l)) 

each group, £;(gpvw+, states ,VS(ioo,n,l}) ' £;(gpvw+, transitions , 7^5^00, ' 

and time ,vs ( 100 , jv, been plotted against L. The results show 

A’(GPVW+, time ,F5 goo,jv,l)) ^ ^ 
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that LTL2AUT clearly outperforms GPVW+, with respect to both the size of 
automata and computation time. Indeed, just considering formulas of length 
30, LTL2AUT produces on the average less than 60% of the states of GPVW+ 












Fig. 4. LTL2AUT vs. GPVW+. Upper row: Average-behavior analysis, F = 100, 
N = 1, 2, 3, L = 5, 10, . . . , 55. Middle and lower rows: Temporal-behavior analysis, 
F = 100, A = 1, 2, 3, L = 20, 30, F = 0.3, 0.5, 0.7, 0.95. 



(for transitions situation is even better) spending on the average less than 30% 
of the time of GPVW+. Moreover, the initial phase, in which LTL2AUT does 
have a time overhead with respect to GPVW+, affects formulas, for L = 5 and 
= 3, which are solved by LTL2AUT in at most 0.000555 CPU seconds, as op- 
posed to the most demanding sample for L = 55 and A = 3, which is solved by 
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LTL2AUT in 6659 CPU seconds. For the temporal-behavior analysis, LTL2AUT 
and GPVW+ have been compared over 2400 random formulas generated for 
F = 100, A = 1,2,3, L = 20,30, and F = 0.3,0.5,0.7,0.95. Note that F = 0.3 












Fig. 5. GPVW+ vs. GPVW. Upper row: Average-behavior analysis, F = 100, A = 
1, 2, 3, L = 5, 10, . . . ,40. Middle and lower rows: Temporal-behavior analysis, F = 100, 
A = 1, 2, 3, U = 10, 20, F = 0.3, 0.5, 0.7, 0.95. 



is the probability we have assumed for the average-behavior analysis. Formulas 
have been collected in 3 groups, for A = 1,2,3, and inside each group partitio- 
ned into 2 sub-groups, for L = 20,30. Each sub-group has still been partitioned 
into 4 problem sets, for F = 0.3,0.5,0.7,0.95. For each sub-group, we have plot- 
1 ii;(LTL2AUT, states ,VS(ioo,N,L,P}) A’(LTL2AUT, transitions ,VS^ioo,n,l,p}) 1 
^ A’(GPVW+, states ,FA^ioo,jv,l,p)) ' A’(GPVW+, transitions ,VS^ioo,n,l,p}) ' 
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Tp P vT7 ’ Tp ^ ° ° against P. Again, the results demonstrate that 

LTL2AUT clearly outperforms GPVW+. 

The comparison between GPVW+ and GPVW, whose results are shown in 
Figure 5, follows the lines of the previous one, by only changing some parame- 
ters for allowing GPVW to compute in reasonable time. The average-behavior 
analysis has been carried out over 2400 random formulas generated for F = 100, 
N = 1,2,3, and L = 5,10, ...,40. The temporal-behavior analysis has been 
performed over 2400 random formulas generated for F = 100, N = 1,2,3, 
L = 10,20, and F = 0.3,0.5,0.7,0.95. The results show that GPVW+ clearly 
outperforms GPVW both in the size of automata and, after an expected initial 
phase, also in time. The initial phase interests formulas, for L = 10 and V = 3, 
which are solved by GPVW+ in at most 0.004226 CPU seconds, as opposed to 
the hardest sample for L = 40 and V = 3, which is solved by GPVW+ in 178 
CPU seconds. 

Finally, a direct comparison between LTL2AUT and GPVW can be found in 
[DGV99]. 

7 Conclusions 

We have demonstrated that the algorithm for building an automaton from a 
linear temporal logic formula can be significantly improved. Moreover, we have 
proposed a test methodology that can be also used for evaluating other LTL 
deciders, and whose underlying concept, namely targeting a uniform coverage 
of the formula space, can be exported to other logics. Of course, the notion of 
uniform coverage can be further refined, and this is part of our future work. In 
particular, we plan to adapt to LTL the probability distributions proposed in 
[MSL92] for propositional logic and adapted in [GS96] to the modal logic AT. 
These distributions assigns equal probabilities to formulas of the same structure 
(e.g., 3-CNF in the propositional case). We are also planning to extend the con- 
cept of syntactic implication to a semantic one and, finally, to explore automata 
generation in the symbolic framework. 
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Abstract. In this paper we extend one of the main tools used in verifi- 
cation of discrete systems, namely Binary Decision Diagrams (BDD), to 
treat probabilistic transition systems. We show how probabilistic vectors 
and matrices can be represented canonically and succinctly using proba- 
bilistic trees and graphs, and how simulation of large-scale probabilistic 
systems can be performed. We consider this work as an important con- 
tribution of the verification community to numerous domains which need 
to manipulate very large matrices. 



1 Introduction 

Many problems in discrete verification can be reduced to the the following one: 
given a non- deterministic finite-state automaton A = (Q, and a set P ^ Q of 
states^ find the set P* of all the states reachable from P. One common way to 
do this calculation is to let P^ = P and = S[P'^) until P^ is included in 

the union U . . . U Here P^ is the set of states reachable from P after 

exactly i steps. 

This method can be formulated using Boolean state-vectors and transition 
matrices. Each subset P of an n- element set of states can be written as an 
n-dimensional Boolean row vector p (a function from Q to {0, 1}) and any tran- 
sition relation ^ as an n x n Boolean matrix As (a function from Q x Q to 
{0,1}). Thus, the calculation step pWi = S[P'^) is equivalent to the multipli- 
cation of a vector by a matrix: = fi' - As- For example, consider Figure 1 

where a 5-state automaton is depicted along with its corresponding 5x5 ma- 
trix As. The reader can verify that calculating the states reachable in one step 
from P = {1,2} is done via the multiplication [1,1, 0,0,0] • As = [0, 1,1,0, 1] 
where logical conjunction and disjunction replace multiplication and addition, 
respectively. 

Probabilistic transition systems, such as discrete Markov chains^ operate in 
a similar but different fashion. At any given stage of the system’s evolution the 
state is given by a probability function p : Q ^ [OA] such that ^ 

The transition structure is probabilistic as well and is represented by a function 

This work was partially supported by the European Community Esprit-LTR Project 
26270 VHS (Verification of Hybrid systems) and the Erench- Israeli collaboration 
project 970maefut5 (Hybrid Models of Industrial Plants). 



N. Halbwachs and D. Peled (Eds.): CAV’99, LNCS 1633, pp. 261-273, 1999. 
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Fig. 1. A non-deterministic automaton and its transition matrix. 



6 : Q X Q ^ [0^1] where S[q^q^) denotes the conditional probability of being 
in q^ in the next-state given that the current state is q. The evolution from 
one probabilistic state vector to another is captured by the vector by matrix 
multiplication = p^ * ^( 5 , this time over the reals. 

The state- explosion problem^ also known as the curse of dimensionality^ arises 
when the system under consideration is composed of many sub-systems. The 
size of the global state-space is exponential in the number of components and 
verification by explicit enumeration of states and transitions becomes impossible. 
Symbolic methods provide an alternative to explicit state enumeration. They are 
based on the following observation: the global state-space of a composed system 
can be encoded naturally using state-variables (a variable for the local state 
of each component). The evolution of each variable usually depends on a small 
subset of the other variables and the corresponding transition law can be written 
concisely as a formula in some adequate formalism (e.g. propositional logic when 
the variables are Boolean) and the global transition relation is a conjunction of 
such formulae. Similarly, sets of states can be written down as formulae. With the 
aid of appropriate data-structures, a symbolic version of the basic computation 
can be performed, calculating a (hopefully concise) representation 
of from a representation of and S, 

In verification of systems modeled as automata this technique is called sym- 
bolic model- checking [McM93,BCM+93] and it had a great success. In fact it can 
be seen as one of the breakthroughs in verification, facilitating the analysis of 
systems with hundreds of state variables, far beyond the capabilities of explicit 
enumeration on current and future computers. The most popular representa- 
tion scheme used in symbolic verification is the binary decision diagram (BDD), 
which is a formalism for representing Boolean functions, admitting the following 
properties [B86,MT98]: 

1. It is canonic - given an ordering of the variables, a unique BDD corresponds 
to every Boolean function. 

2. There are relatively-efficient algorithms for manipulating BDDs, in particu- 
lar for the operations needed to compute 

3. It performs well in the analysis of many structured systems: the size of the 
BDD remains small relative to the size of the state-space. 

The goal of the paper is to apply this recipe to probabilistic systems, that 
is, to define a representation formalism for probabilistic vectors and transition 
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functions such that the operation • As could be performed for systems 

for which it is impossible to do so using currently existing methods. To this end 
we define probabilistic decision graphs (PDG)^ , a data-structure for representing 
probabilities over structured domains which enjoys the nice properties of BDDs. 

The rest of the paper is organized as follows. In section 2 we present probabi- 
listic decision trees and graphs and show they constitute a canonic representation 
for probabilities. In section 3 we rephrase the basic definitions of Markov chains. 
Section 4 is devoted to the representation of probabilistic transition functions by 
conditional probabilistic graphs and sketch the PDG structure of some generic 
classes of probabilistic systems. The calculation of next-state probabilities on 
PDGs via the projection operation described in section 5 and some preliminary 
experimental results are reported in section 6. Finally we discuss the signifi- 
cance of this work and mention some of the previous relevant applications of 
BDD technology outside the Boolean realm. 

2 Probabilistic Decision Graphs 

Let B = {0, 1}. We assume an underlying set Q = B^, and a probability distribu- 
tion on i.e. a function p : Q ^ [0^1] such that ^ ^ function 

can be extended naturally to subsets of Q by letting p{Q') = p{q) for every 

Q' Q. We will abuse strings from B-^ (the set of binary strings of length not 
greater than n) to denote certain subsets of B^. A string u — X\X 2 • • • Xn will 
stand for the singleton {(xi, . . . , x^)} while a string XiX 2 • • • x^, i < will stand 
for the set {(xi, . . . , x^, x^^i, . . . , x^) : (x^^i, . . . , x^) G B^“^}. This can be de- 
fined recursively by associating with u the union of the sets associated with wO 
and ul. Note that the empty string s’ denotes the whole B^. To avoid additional 
symbols we use the same notation for a string and for the set it denotes. The set 
B-^ has a binary tree structure and every level B^ corresponds to a partition of 
B^. The next definition is the essence of this paper. 

Definition 1 (Probabilistic Decision Trees). A probabilistic decision tree 
(PDT) of depth n is a tuple P = (N,0, l,x) where S = B-^^ 0 and 1 are 
respectively the left-successor and right-successor partial functions on and 
X : 5 ^ [0, 1] is a function satisfying r(s) = 1 and for every non-leaf node 
r(s0) + r(sl) = 1, 

Theorem 1 (Unique Representation). There is a one-to-on(? correspon- 
dence between probabilities on B^ and PDTs. 

Proof: First we assign probabilities to nodes by letting p[e) = 1 and 

p[sx) = p[s) • v[sx) xgB (1) 

It is not hard to see that all p values are in [0, 1] and that their sum at each level of 
the tree is 1. Gonversely, given a probability on the leaves, it is straightforward to 

^ We say “graphs” instead of “diagrams” to avoid yet another xDD acronym. 

^ In our definition there is an implicit ordering on the “variables”. 
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calculate the probability of the sets associated with the upper nodes by letting 
p{s) = p{sO) + p(-sl) and then compute v via normalization, i.e. the inverse 
of (1): v[sx) = p[sx)/p[s). In the case when p[s) = 0 we can put any number in 
v[sx) = 0/0, and a convention such as 1/2 can be used. j 

PDTs are nothing but the presentation of probabilities using the so-called 
“chain-rule”, the probabilistic analogue of Shannon factorization of Boolean fun- 
ctions which underlies BDDs: 

p{xiX2 • • -Xn) = p{xi) ‘ p{xiX2\xi) • • ^ p{xiX2 * * ' Xn\xi • • 

where p{r\s) is the conditional probability of r given s. We will replace this 
unfortunate (but very common) notation with Ps{r) such that the above rule 
will be written as 



p{xiX2 ■■■Xn) = p{xi) ■pxi{xiX2) ' ■ ■ Px,^^^x^_,{xi ■ --Xn). 

Decision trees are exponential in the number of variables and, by themselves, 
do not solve the state explosion problems. However, when there is some structure 
in the objects they represent, different nodes may have identical sub-trees and 
the tree can be represented concisely by a directed acyclic graph (DAG) carrying 
the same information. The transformation of a tree into a DAG is a variation of 
the classical procedure for minimizing automata, and can be phrased as follows. 

Definition 2 (Probabilistic Decision Graphs). Let P = (5", 0,l,u) he a 
PDT and let be a congruence relation^ on S defined as s ^ if = v{s^) 
and both sO ^ sT and si ^ sG. The associated probabilistic decision graph 
(PDG) ts G= (5^/-,0,l,i;). 

In other words, the nodes of G are the equivalence classes of Graphically 
speaking, the process starts from the bottom of the tree by merging leaves sx 
and PxJ which have identical t?’s. Then the edge from s labeled by x and the 
edge from labeled by x^ are redirected toward the merged node and the process 
continues recursively upward. Note that sx = 1. for a leaf s, hence s ^ s^ only if 
both belong to the same level of the tree. 

Example: Gonsider the following probability function over 

000 001 010 oil 100 101 110 111 

i n 2 1 4 1 1 4 

6 15 30 15 15 15 15 

Figure 2-(a) shows the probabilities of all subsets in B-^. The PDT in Figure 2- 
(b) is obtained via the normalization v[sx) = p[sx)/p[s). The reduction modulo 
^ into a PDG starts in Figure 2-(c) by merging identical leaves and terminates 
in Figure 2-(d) by merging some of their parents.^ Like in BDDs, when there is 

^ Congruence with respect to the 0 and 1 operations. 

^ Unlike BDDs we do not go further and eliminate nodes whose left and right successors 
are identical: we restrict ourselves to balanced DAGs where all paths from the root 
to the leaves are of the same length, otherwise we cannot satisfy the requirement 
that the sum of the leaves at every level is 1. 
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a lot of independence between the variables, the size of the PDG is much smaller 
than the size of Q. In the rest of the paper we describe algorithms in terms of 
full trees, bearing in mind that the actual implementation reduces every tree 
into its corresponding minimal DAG. 







Fig. 2. Transforming a probability function (a) into a PDT (b) and successively via 
(c) into a PDG (d). 



3 Markov Transition Functions 

Having defined a canonical representation for probabilistic state vectors, we now 
move to the representation of transition matrices. In a non-probabilistic setting 
there is not much difference between sets (subsets of B^) and relations (subsets 
of B^^) and both can be represented by HDDs of the same type. For probabilistic 
systems, we must be more careful. 

Definition 3 (Markov Transition Function). A Markov transition function 
on Q is a function ^ : Q ^ (Q ^ [0, 1]) such that for every ^ G Q; : Q ^ [0, 1] 
is a probability function on Q, 
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In 20th century mathematics, such functions used to be written as \Q\ x ICI 
matrices such as 



^i(l) 5i(2) ... 

^ ^2(1) (^2(2) . . . 

6^{2) . . . 6^{n) 



where each line represents a particular 6q. The action of ^ on a probabilistic 
state- vector p can be decomposed into two stages. The first can be viewed as 
applying a function 6 : {Q ^ [0, 1]) ^ (Q x Q ^ [0? 1]) where p = 5[p) if for 
every g, G Q, = p{q) * other words, given that the current 

state probability is p, 6{p) denotes the probability of any transition to happen. 
Matrix- wise, when p is written as a vector [pi, . . . ,Pri], calculating 6{p) amounts 
to multiplying every element of p by the elements of its corresponding row in 6 
to obtain 

Pi • Pi • (^1(2) ■ ■ ■ pi ■ Si(n) 

A _ P 2 • ^2(1) P 2 • ^2(2) ... P 2 • 62(71) 

- 

Pn * Pn * ^n-( 2 ) • • • Pn * 



Note that unlike 6{p) N a probability function on Q x Q. 

The probability of being in the next step at a state q' is then the sum of the 
probabilities of the form p(g, g^), i.e. those leading to gb This can be captured by 
a function: w : {QxQ ^ [0, 1]) ^ (Q ^ [0, 1]) defined as w{6) = N- Matrixly 

speaking, this is equivalent to summing up every column of to obtain a 

vector pb Hence the composition w o 6 : (Q ^ [0, 1]) ^ (Q ^ [0, 1]) gives the 
evolution of the system as the action of a probabilistic transition matrix on a 
probabilistic state vector.^ 

Next we define a data-structure for representing 6 when Q — and a natural 
way to transform it, given a PDG-represented probability p, into a PDG of depth 
2n for 6{p). After that we define the basic operation on PDGs, the projection 
which is used in the calculation of w. 



4 Conditional Probabilistic Decision Graphs 

The basic idea is to extend PDTs such that nodes at certain levels of the tree 
are empty (with v undefined) to denote undetermined variables.® To this end we 
will use somewhat more elaborate notations. 

Let X = {1^,2®, . . . , n^} and Y = {P, 2^, . . . , n^} be two copies of {1, ... , n}. 
An order relation -< on X U T can be written as a bijection J : {1,2,..., 2n} ^ 

^ For those familiar with BDDs, we mention that these operations resemble the non- 
probabilistic ones: ^(g,g^) = p(g) A<5(g,g^) and tc(g^) = 3g ^(g,g^) = Vg 
® In fact we could have started the paper by defining data-stmctures for conditional 
probability functions, with a partition of variables into two types. This way we 
could obtain probability functions as the special case where all the variables are 
determined, and Markov transition functions as a special case where the sizes of the 
two sets of variables are the same and certain restrictions are imposed on variable 
dependencies. However, we prefer clarity over generality. 
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X UY. Without loss of generality we assume that -< is compatible with the na- 
tural ordering of X and of P, i.e. 1® -< 2® n®. Given any binary string 

s G can be mapped into a pair of strings J^[s) and from B-^. For ex- 

ample, if J = P -< 2® -< P -< 3® -< 2^ -< 3^ then for a string s = xiX 2 y\x^y 2 y^^ 
J®(s) = X\X 2 X^ and = yiy 2 ^ We also extend our string notation for sets: 

a string of the form Xi^Xi^ • • -Xi^ with 0 < G < ^2 < • • • < ^ ^ will denote 

a subset of B^ with the obvious meaning, i.e. the set of n-tuples such that the 
value of every -coordinate is 

A Markov transition function over B^ is a function 6 : B^ ^ (B^ ^ [0, 1]) 
whose instances are written as • • -Z/ri). For every xi • • -x^, is 

a probability function which can be written using the chain rule just as as any 
other probability: 

^xi --Xniyi ’ ’ ’ yn) — (Z/l ) * •• (Z/lZ/2 ) ' ' ' •••yri.-i (Z/l ' ' * - 

We restrict our attention to Markov chains in which every coordinate of the 
state-space behaves causally^ i.e. it depends only on the previous values of the 
state variables.^ This means that for every xi . . .x^ and every yj we have 
^xi---x^yi{yj) = Hence 6 can be written as: 

(z/i * * * yn) — (z/i) * (z/2 ) ' ' ' {yn)~ (2) 

We say that j'^ is independent of if for every Xi, . . . ,x^_i,x^^i, . . . x^, 

^Xi---Xi—iOXiJ^l---Xn.{yj) ^Xi---Xi—ilXiJ^i---Xn.iyj)' 

In this case we can use the notation - (%)• When this is not the 

case we say that inflences p and denote it by Y p , 

An order relation -< on A UF is compatible with a Markov transition function 
6 iff for every G G T, P ^ j'^ implies -< p . The default ordering 

P -< ...-<n® -< P -< ...-<n^ is compatible with any 6 and is the only one 
compatible with a 6 for which every j'^ depends on all A. 

Definition 4 (Conditional PDT and PDG). A conditional probabilistic 
decision tree (CPDT) of depth n is a tuple P = (5,0, 1, J, x) where S = B^^^^ 0 
and 1 are as in a PDd\ J is the ordering bijection and x : 5 ^ [0, 1] is a partial 
function^ defined only on nodes s such that J(|5|) G satisfying v[e) = 1 
and for every node x(s0) + x(sl) = 1 whenever it is defined. A conditional 
probabilistic decision graph (CPDG) is G = (5/ ^,0,1, J,x) where ^ is the 
congruence relation of Definition 2. 

Theorem 2 (CPDT=Markov Transition Function). There is a one-to-one 
correspondence between Markov transition functions and CPDTs. 

^ Note that one can write Markov transition functions over Q which do not admit 
such a causal decomposition, and this observation might be a source of interesting 
investigations in the theory of stochastic processes. In fact, the above implies that 
every m-state Markov chain which admits a causal decomposition can be represented 
in space 0(m log m) instead of O(m^). 
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Sketch of Proof: Similar to that of Theorem 1. We assume a fixed ordering 
bijection J compatible with 6. For every F-node syi of the CPDT we associate 
v[syi) with the conditional probability for example v{xiX 2 yix^y 2 ) = 

(z/ 2 )- To reconstruct 6 from a tree we go down the tree until we calculate 
6 for the lowest T-nodes. To build a CPDT from 6 we climb-up starting from 
the F -leaves and construct the tree. j 






Fig. 3. Schematic CP DCs for Markov transition function which consist of: (a) Inde- 
pendent Bernoulli trials (b) Independent Markov chains (c) A cascade with k = 2. The 
dark nodes indicate F-nodes. 




Fig. 4. A schematic CPDG for an arbitrary (but causal) Markov transition function. 



We mention some classes of probabilistic transition systems such that the 
pattern of interaction between their components alone suffices for giving an 
upper-bound on the size of their CPDGs. Consider first the degenerate case 
of n independent Bernoulli trials. It can be modeled as a direct product of n 
memory-less automata, for which the probability of the next state is independent 
of the current state. Thus, 6xi...x^{yi . . . yn) can be written as ^(yi) • • • 6{yn) and 
represented by a CPDG without empty nodes, which is in fact a PDG, like in 
Figure 3- (a). 

As a slightly less trivial example consider a direct product of n independent 
2-state Markov chains. In this case each depends only on and the transi- 
tion function can be represented by the CPDG of Figure 3-(b). More generally. 
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consider a cascade of probabilistic automata where the transition 

probabilities of each automaton Ai depends on the states of its k predecessors 
(including itself) Ai-k^i ^ . . . ^A^^-i^Ai. Such systems will have a CPDG of size 
0(n2^) similar to the one appearing in Figure 3-c for k = 2. 

When there are no such constraints on variable dependencies, the default 
order needs to be used and no a-priori lower-bound better than n2^ can be 
stated (although some independencies might make the corresponding CPDG 
smaller). We repeat that even this bound is better than the 2^^ size implied 
by a straightforward encoding of the transition matrix. The general structure of 
such a CPDG is depicted in Figure 4. 

Going from p and 6 to 6{p) is straightforward: take i;(s) from the PDT for p 
and put it in any nodes of the CPDT of 6 such that J®(s^) = s. This way the 
whole tree becomes full and represents the probability S{p) over 

5 Projection 

The basic operation on probabilities (and PDGs) is the probabilistic analogue of 
the elimination of a quantified variable in Boolean functions (and BDDs). This 
is what is needed to transform 6{p) into 6{p), 

Definition 5 (Projection). Let p : B^ ^ [0,1] he a probability. The k-projec- 
tion of p^ is a function p^k ^ B^“^ ^ [0, 1] defined as 



p{xi • --Xk-lOXkpi ---Xn) 

PiA:(^l • • • • *^ri) = + (3) 

J?(X]_ * * * Xk — \ \ Xkp I ’ x^) 



Using conditional probabilities, (3) can be rewritten as 



p{xi . ..Xk-l) 



Pxi (f^^k-\-l ' ' * ■^n) 

+ 



and further as 



p{xi---Xk-i) 



v{xi • -Xk-lO) - Pxi---Xk-iO{^kpi 



V(xi • -Xk-ll) -Px, •••Xk-i l(Xk + l 



As one can see, performing a /^-projection on the PDT representation of p consists 
of copying the first k — 1 levels of the tree and then plugging at each branch 
Xi • • ’Xk-i a sub-tree which encodes the weighted sum of the functions Pxi---xk-i 0 
and Pxi---xk-ii' This is the main computational burden in the manipulation of 
PDGs. The transformation of a PDT P = (S,0, l^v) for p with S = B^ into a 
PDT P ^k = for Pik with S^k = B^“^ is performed as follows. For 

any node s G B-^“^ we have Pik{s) = p{s). For the other nodes we have 



Piki^i • -Xk-is) =p{xi • -Xk-iOs) Tp{xi • -Xk-ils) 
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These values are calculated from the top down and every calculation of Pik{sx) 
is followed by calculating as which in the first 

k — 1 levels reduces simply to Aplying this procedure n times^ 

we transform a probability on to a probability on B^ and complete the 
computation oi — p • As. While working with PDGs, one can avoid part of 
the computation whenever there is an equivalence of the form = Psi- that 
case the weighted sum r • Pso + (1 — r)psi is equal to both. 

6 Implementation and Experimental Results 

The treatment of the mathematical real numbers by computer involves an addi- 
tional dimension of problematics absent from traditional applications of verifi- 
cation methodology. The continuum is approximated by a very large (but finite) 
subset of the rationals, the floating point numbers. Practitioners seem to be sa- 
tisfied with this approximation. It turns out that for exploiting the advantages 
of PDGs we had to go further and round node values to multiples of 2“^ (for rn 
ranging between 3 to 10), otherwise the size of non-trivial PDGs becomes expo- 
nential after few iterations because of the low probability of two nodes having 
exactly the same floating-point value. With this discretization, systems with li- 
mited interaction among variables usually converge to vectors with a small PDG 
description. As for the semantic price of the approximation, if we reflect a bit on 
the empirical source of probability estimations in models, we realize that these 
numbers are not sacred and an initial “imprecision” of 2“^ does not make any 
difference. 

We have implemented these data-structures and algorithms and tested their 
performance on some generic examples. The implementation is preliminary and 
does not yet employ all the optimizations one can And in BDD packages. Let us 
first mention the trivial cases. For n randomly-generated mutually- independent 
Markov chains we can treat almost any n. This is, of course, not so impressive if 
one realizes that each chain could be simulated separately. Yet someone unaware 
of BDDs will be rather surprised to see how fast you can multiply a 2^^ x 2^^ 
transition matrix void of any apparent structure or sparseness (see table 1). A 
slightly less trivial example is a chain of noisy communication channels where 
each component copies the value of its predecessor with probability 1 — e. Such 
a chain converges to a uniform probability vector where p[q) = 1/2^ for every 
state. Here again we could iterate for very large n with a linear growth in the 
size of the PDGs. 

Next, we have tested randomly-generated cascades of communication depth 
2, which using the previously mentioned discretization, usually converge to vec- 
tors with small PDGs, although exponential ones are, of course, still possible. 
We demonstrate the time and space behavior of the algorithm on a family con- 
sisting of a cascade of noisy AND gates such that each component becomes the 
conjunction of its previous value and that of its predecessors (Figure 5) with 

® Like in BDDS, this procedure can be extended naturally to a procedure that elimi- 
nates several variables in a single pass. 
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0.000564 0.000093 
0.000653 0.000003 
0.000823 0.000135 
0.000953 0.000005 



0.000412 0.000068 
0.000477 0.000002 
0.000153 0.000025 
0.000177 0.000001 



0.000094 0.000015 
0.000108 0.000001 
0.000137 0.000022 
0.000158 0.000001 



0.000068 0.000011 
0.000079 0.000000 
0.000025 0.000004 
0.000029 0.000000 



0.000727 0.000120 
0.000842 0.000004 
0.001061 0.000175 
0.001229 0.000006 



Table 1. An initial fragment of a 2^^ x 2^^ matrix which can be iterated until conver- 
gence within less than a second. 



probability 0.9. The performance results are depicted in Figure 6 and although 
space behaves nicely, computation time still grows exponentially, reaching almost 
4 hours for n = 54. The reason lies in the fundamental difference between BDDs 
and PDGs: in the former, when an algorithms encounters a node, it does not 
need to remember via which branch the node is reached, and thus the hashing 
mechanism prevents duplicate calls. On the other hand, in PDGs, each time the 
projection procedure is called with a node, it has, as an additional parameter, 
the probability associated with its parent. Hence procedure calls with identical 
arguments are rather rare and the current implementation needs to do exponen- 
tial work on linear-sized PDGs. We are currently investigating improvements of 
the implementation. 




Fig. 5. A chain of noisy AND gates. 





Fig. 6. The PDG size and time until convergence as a function of the number of 
variables, for discretizations of 1/1024 and 1/512. 




272 



M. Bozga and O. Maler 



7 Discussion 

We have introduced and implemented a new method for manipulating large 
probabilistic transition systems. We hope that this technique will improve the 
performance of probabilistic simulation tools. In addition, the investigation of the 
structure of PDGs might contribute to a better understanding of the structure 
of probabilistic functions. The application domains which might benefit from 
such a technique are numerous and include performance and reliability analysis, 
probabilistic verification, planning under uncertainty [P94,BDH99], calculation 
of equilibria in economics, statistical mechanics and more. 

This work is built on what we consider to be the main insight of the BDD 
experience: in many situations the indices of rows and columns in matrices are 
the outcome of “flattenning” of much more structured domains. This flattening, 
which is unavoidable if one wants to draw a matrix on a two-dimensional sheet 
of paper, hides the structure of the problem, or at least makes it very hard to 
retrieve.® BDDs and PDGs suggest a way of maintaining this structural infor- 
mation and exploiting it in efficient computations. 

Among previous extensions of BDD technology to represent functions from 

to N (motivated chiefly by arithmetical circuits), R and other domains we 
mention the structure called Multi-terminal BDDs (MTBDD) in [GFM+93] and 
Algebraic Decision Diagrams (ADD) in [BFG+93]. This is a straightforward ex- 
tension of BDDs with leaves having values in non-Boolean domains. Algorithms 
for performing matrix multiplications and other operations on these representa- 
tions have been proposed and applied, for example, to probabilistic verification 
[BGG+97]. The main drawback of MTBDDs/ADDs is that they yield a succint 
representation only if the corresponding vectors and matrices have a lot of iden- 
tical entries, e.g. sparse matrices having many zeros. In contrast many generic 
examples of functions with no interaction between the variables will lead to ex- 
ponential MTBDDs: for example it is not hard to create probabilities on B^ with 
all variables mutually- independent, and yet no two elements will have the same 
probability. In fact, the ability to represent functions concisely as decision graphs 
without putting any information on the non- leaf nodes is a special property of 
Boolean algebra. 

The above observation has led some researchers in the hardware verification 
community [VPL96,TP97] to consider extending BDD with values on their edges 
(which is practically the same as putting values on the nodes, as we do here). 
This structure is called Edge-valued BDD (EVBDD) and it has been used to 
encode the so-called Pseudo-Boolean functions which are essentially functions 
from {0, 1}^ to N. EVBDDs contain both additive and multiplicative constants 
and in some cases overcome the limitations of MTBDDs. However, since the 
class of functions treated by EVBDDs is much less constrained than the class of 



® Just compare the non-intuitive definition of the Kronecker product (also known 
as Tensor product) of two matrices with the straightforward Cartesian product of 
automata. 
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probabilistic functions, normalization and matrix multiplication are much more 
complicated than the ones reported in this paper. 

Finally, let us mention another formalism, related to PDGs, the Bayesian 
Networks which are used extensively in AI [P88,J96]. Like PDGs, Bayesian net- 
works consist of a graphical representation of variables and their probabilistic 
dependencies. The comparison between the two formalisms is outside the scope 
of this paper, but it seems that PDGs can be viewed as a constrained and well- 
behaving sub-class of networks, with a special emphasis on the dynamic aspects 
(next-state probabilities) which makes them, perhaps, more suitable for treating 
large-scale Markov decision processes. 

Acknowledgements: We are grateful to Moshe Tennenholz for raising the pos- 
sibility of applying some verification techniques to AI problems of planning under 
uncertainty. His visit in Grenoble, in fact, triggered this work. We thank Amir 
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cular for the observations concerning causal Markov chains and weighted sum of 
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Abstract. We address the problem of relating the result of model check- 
ing a partial state space of a system to the properties actually possessed 
by the system. We represent incomplete state spaces as partial Kripke 
structures, and give a 3-valued interpretation to modal logic formulas 
on these structures. The third truth value T means “unknown whether 
true or false”. We define a preorder on partial Kripke structures that 
reflects their degree of completeness. We then provide a logical characte- 
rization of this preorder. This characterization thus relates properties of 
less complete structures to properties of more complete structures. We 
present similar results for labeled transition systems and show a connec- 
tion to intuitionistic modal logic. We also present a 3-valued CTL model 
checking algorithm, which returns T only when the partial state space 
lacks information needed for a definite answer about the complete state 
space. 



1 Introduction 

The theory and engineering of model checking has led to tools that can analyze 
systems with millions of states. However, many systems we would like to analyze 
have state spaces that are still often orders of magnitude larger than these tools 
can handle. In this case a common approach is simply to explore just a part of the 
state space; unexplored states and transitions are then absent in the incomplete 
or “partial” state space. 

In model checking a partial state space, the main issue is how answers obtai- 
ned in checking the partial state space relate to properties of the full state space. 
Obviously, one cannot assume that all answers apply to the full state space. The 
partial state space may be lacking “bad” states one is interested in avoiding, or 
“good” states that one is interested in reaching. Naively, one could work only 
with simple safety properties such as “all reachable states satisfy proposition 
p” , with the understanding that if the answer is false in the partial state space 
then it is also false in the complete state space. But this approach too strongly 
restricts the properties we can check. 

Glearly a more systematic understanding is needed. We can work logically 
and ask: which class of properties will hold of the complete state space just if 
they hold of the partial state space? Or we can work operationally and ask: 
how should we describe the relationship between a partial state space and a 
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more complete one? For example, we might consider that a partial state space 
is simulated by a more complete one. Then we know that box- free modal mu- 
calculus formulas that hold of the partial state space will hold of the complete 
state space [BBLS92]. A problem with using the simulation relation for this 
purpose is that it limits the kinds of properties we can check of the partial state 
space, and does not tell us that if a property fails to hold of a partial state space 
then it also fails to hold of more complete state spaces. 

Our solution to this problem is to use models that capture explicitly the in- 
completeness of state spaces, and to use 3-valued logics to capture the possibility 
that we may not know whether a property is true or not in a partial state space. 
In this approach every formula of the logic can be checked of the partial state 
space. If the answer true or false is obtained then the answer also holds of the 
complete state space. If the answer _L (meaning “unknown”) is obtained then 
the partial state space lacks information needed for a definite answer about the 
complete state space. 

A state-based framework is adopted for most of the paper. In the next section 
we review Kripke structures and propositional modal logic. In Section 3 we 
define partial Kripke structures and the interpretation of modal logic on these 
structures. We also define a preorder on partial Kripke structures that reflects 
their degree of completeness and show that propositional modal logic (under our 
3-valued interpretation) characterizes this preorder. In Section 4 we present a 
model checker for 3-valued CTL. In Section 5 we show how our results can be 
applied to the problem of model-checking partial state space. In Section 6 the 
main results of Section 3 are reworked in an action-based framework. In Section 7 
we present our conclusions and discuss related work. 

2 Kripke Structures and Modal Logic 

Let P be a nonempty finite set of atomic propositions. 

Definition 1. A Kripke structure M is a tuple (S', L,P)^ where S is a set of 
states^ L : S X F ^ {true^ false} is an interpretation that associates a truth 
value in {true^ false} with each atomic proposition in F for each state in S ^ and 
7Z C S X S is a transition relation on S . 

For technical convenience, we assume that a Kripke structure has no termina- 
ting state by requiring that IZ be totals i.e., that every state has an outgoing 
7^-transition. This assumption does not restrict the modeling power of the for- 
malism, since we can model a terminated execution as repeating forever its last 
state by adding a self-loop to that state. Note that Kripke structures can be 
nondeterrninistic: a state can have more than one outgoing 7^-transition. We 
also assume that the number of outgoing transitions from a state is finite. 

Temporal logics are modal logics geared towards the description of the tem- 
poral ordering of events [Eme90]. Propositional modal logic (e.g, see [Var97]) is 
propositional logic extended with the modal operator O. Propositional modal 
logic can itself be extended with a fixpoint operator to form a modal fixpoint 
logic, also referred to as the propositional //-calculus [Koz83]. This very expres- 
sive logic includes as fragments linear-time temporal logic (LTL) [MP92] and 
computation-tree logic (CTL) [CE81]. 
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For the sake of simplicity, let us first consider propositional modal logic. More 
expressive logics will be discussed later. We now recall the syntax and semantics 
of propositional modal logic. 

Definition 2. Given a nonempty finite set F of atomic propositions^ formulas 
of propositional modal logic have the following abstract syntax^ where p ranges 
over F: 

(j) ::= p I —>(j) I A 02 1^0 

Definition 3. The satisfaction of a formula 0 of propositional modal logic in 
a state s of a Kripke structure M = (S', L,7^)^ written (M, s) |= 0^ is defined 
inductively as follows: 

(M,s) \= p ifLfs^p) = true 

(M,s)|=-.0 %f{M,s)\^(j) 

(M, s) 1= 01 A 02 if (Ai, s) 1= 01 and (M, s) |= 02 

(M, s) 1= O0 if {Mfi) \= 0 for some t such that (s,t) G TZ 

The derived modal operator □ is the dual of O, i.e., -lO-i. Thus, we have 

(M, s) 1= □ 0 if (M, t) 1= 0 for all t such that (s, t) G TZ. When M is understood, 

we write s \= f instead of (M, s) |= 0. 

Propositional modal logic can be used to define an equivalence relation on 
states of Kripke structures: two states are equivalent if they satisfy the same set 
of formulas of the logic. It is well known [HM85] that the equivalence relation 
induced in this way by propositional modal logic coincides with the notion of 
bisimulation relation [Mil89,Par81] (or more accurately, with that of zig-zag 
relation [vB84], since propositions are mentioned in the relation). 

Definition 4. Let Mi = (5'i,Li,7^i) and M 2 = (* 5 ^ 27 ^ 27 ^ 2 ) Kripke struc- 
tures. A binary relation B C Si x S 2 is a bisimulation relation if (si,S 2 ) G B 
implies: 

— ype -P ■■ Li{si,p) = L2 {s2,p), 

— if{si^s[) G 7^1; then there is some ^ ^2 such that ( 52 , 52 ) G 7^2 
( 5 ^^, S 2 ) E B^ and 

— if { 32 ^ 82 ) G 7^2; then there is some Si G such that ( 51 , 5 ^^) G hZi and 
(5i, 52 ) G B. 

Two states 5i and s ‘2 are bisimilar^ denoted 5i ^ 52; if they are related by some 
bisimulation relation. 

Theorem 1. [HM85] Let Mi = (5'i,Li,7^i) and M 2 = (*5^2 7 7^2, 7^2) be Kripke 
structures such that 5i G and s ‘2 G S 2 ; and let<P denote the set of all formulas 
of propositional modal logic. Then 

(V0G#: [(Mi,5i) H0]= [(M2,52) h0D ^ M-^52. 

Propositional modal logic is then called a logical characterization of This 
means that propositional modal logic cannot distinguish between bisimilar sta- 
tes, and that states satisfying exactly the same set of propositional modal logic 
formulas are bisimilar. 
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3 Partial Kripke Structures and 3- Valued Modal Logic 

To model check partial state spaces, we need a way to model the absence of in- 
formation about the missing parts of the full state space, both operationally (in 
terms of Kripke structures) and logically (in terms of modal logics). A natural 
approach to the operational modeling of incompleteness is to model an incom- 
plete state space with a kind of partially-defined Kripke structure. We show that 
a compatible approach in the logical modeling of incompleteness is to interpret 
modal logic with a third truth value _L, which is understood as “unknown”. 
More precisely, we model partial state spaces as partial Kripke structures. We 
then define a 3-valued modal logic whose semantics is defined with respect to 
partial Kripke structures. We proceed by studying an equivalence relation and 
preorder implicitly defined by this logic. As before, let be a nonempty finite 
set of atomic propositions. 

Definition 5. A partial Kripke structure M is a tuple (S', L,7^)^ where S is a 
set o/ states^ L : S x F ^ {true^ false} is an interpretation that associates a 
truth value in {true^ false} with each atomic proposition in F for each state 
in S ^ and 7Z C S x S is a transition relation on S. 

In interpreting propositional modal logic on partial Kripke structures, we 
interpret the operators A and -i using Kleene’s strongest regular 3-valued pro- 
positional logic [Kle87]. In this logic _L is understood as “unknown whether 
true or false”. A simple way to define conjunction (resp. disjunction) in this 
logic is as the minimum (resp. maximum) of its arguments, under the order 
false < _L < true. We write min and max for these functions, and extend them 
to sets in the obvious way, with min(0) = true and max(0) = false. We define 
negation using the function neg that maps true to false^ false to true^ and T 
to T. Notice that these functions give the usual meaning of the propositional 
operators when applied to values true and false. 

We now consider a 3- valued propositional modal logic having the same syntax 
as propositional modal logic, and the following semantics. 

Definition 6. The truth value of a formula f of 3- valued propositional modal 
logic in a state s of a partial Kripke structure M = (S', L,7^)^ written [(M, s) |= 
(j)]^ is defined inductively as follows: 

\=p] = m,p) 

[{M,s) \= = neg{[{M,s) \= cj}]) 

[{M, s) 1= (^1 A (> 2 ] = min{[{M, s) |= <pi], [(M, s) |= (^ 2 ]) 

[{M, s) 1= O (^] = max[{[[M ,t) |= 4>\ \ {s,t) e TZ}) 

We again define □ as the dual of O, so [(M, s) \= B f] = min({[(M, t) |= f] \ 
(s,t) G dZ}). This semantics gives the usual meaning of the propositional and 
modal operators when applied to complete Kripke structures. 

This 3-valued propositional modal logic can be used to define a preorder 
on partial Kripke structures that reflects their degree of completeness. Let < 
be the ordering on truth values such that T < true^ T < false^ x < x (for 
all X G {true^ false}) ^ and x ^ y otherwise. Note that the operators neg. 
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min and max are monotonic with respect to <\ ii x < x' and y < y\ we have 
neg(x) < neg(x^), min(x,y) < min(x^,y^), and max(x,y) < max(x^,y^). This 
property is important to prove the results that follow. 

Definition 7. Let Mi = and M 2 = (5'2,T2,7^2) partial Kripke 

structures. The completeness preorder is the greatest relation^ C x S 2 such 
that Si ^ 52 implies the following: 

— t/pe P : Li{si,p) < L2 {s2,p); 

— if{si^s[) G 7 ^ 1 ; then there is some S2 G S2 such that (52,52) G 7^2 ci'^d 
s[ :< 52 ; and 

— ^7(52,52) G 7 ^ 2 ; then there is some s'l G such that (51,5^^) G 7 ^i and 
5 ; ^ 5 ^. 

Intuitively, 5i ^ 52 means that 5i and 52 are “nearly bisimilar” except that the 
atomic propositions in state 5i may be less defined than in state 52 . Obviously, 
5i ^ 52 implies 5i ^ 52 . 

The following theorem shows how the completeness preorder can be logically 
characterized with 3-valued propositional modal logic. 

Theorem 2. Let M\ = (5'i,Li,7^i) amd M 2 = 772,7^2) partial Kripke 

structures such that 5i G S^i and s‘2 G S2; and let<P denote the set of all formulas 
of 3-valued propositional modal logic. Then 

[(Mi,si) h<A]< [(M 2 ,S 2 ) * Si 5 s2. 

Proof. Proofs of theorems are omitted in this extended abstract because of space 
constraints. 

In other words, partial Kripke structures that are “more complete” with respect 
to ^ have more definite properties with respect to <, i.e., have more properties 
that are either true or false. Moreover, any formula f of 3- valued propositional 
modal logic that evaluates to true or false on a partial Kripke structure has the 
same truth value when evaluated on every more complete structure. 

Formulas that evaluate to T on a partial Kripke structure must be evaluated 
on a more complete structure to get a definite answer. Obviously, any partial 
Kripke structure can be completed to obtain a traditional fully-defined Kripke 
structure, where f always evaluates to either true or false. Some partial Kripke 
structures can only be completed to form Kripke structures that all satisfy the 
property 7, or to form Kripke structures that all violate f (this is the case, for 
instance but not exclusively, when f is Si tautology or is unsatisfiable with a 
2-valued interpretation). Some other partial Kripke structures can be completed 
to form Kripke structures that satisfy f as well as Kripke structures that violate 
(j). Note that checking a formula 7 on a partial Kripke structure may return _L 
even if 0 is a tautology or is unsatisfiable in the 2-valued interpretation. 

The following theorem states that 3-valued propositional modal logic logically 
characterizes the equivalence relation induced by the completeness preorder N. 
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Theorem 3. Let M\ = and M 2 = ( *§'2, ^2,^2) partial Kripke 

structures such that G and s ‘2 G S 2 ; and let^ denote the set of all formulas 
of 3-valued propositional modal logic. Then 

iy(j)e‘P ■■ [(Ml, Si) [=(>] = [(M2, S2) 1 = 4>\) i (si ^ S2 and S 2 ^ si). 

The bisimulation relation of Definition 4 can be applied directly to partial 
Kripke structures. Two states and S 2 of partial Kripke structures are hisirnilar\ 
denoted ^ S2, if they are related by some bisimulation relation. Since ^ is a 
stronger relation than diMi ^2 implies both si ^ S 2 and S 2 di £^nd that 
3-valued propositional modal logic cannot distinguish between bisimilar states. 

However, the converse is not true: si ^ S 2 and S 2 di does not imply 
Si ^ S2- This is illustrated by the example below. The existence of such an 
example proves that, in contrast with 2-valued propositional modal logic, 3- 
valued propositional modal logic is not a logical characterization of bisimulation 
as defined in Definition 4. 

Example 1, Here is an example of two non-bisimilar states that cannot be di- 
stinguished by any formula of 3-valued propositional modal logic. 

sO s’O 




These two partial Kripke structures have two atomic propositions p and g, 
whose truth value is defined in each state as indicated in the figure by a pair of 
the form {p^q). We have the following relations: 

“ -^2 ^ -^2 '^2 ^ '^2 7 

~ '^3 ^ 3 £^nd ^3 ^ S3, 

“ '^1 ^ '^2 '^3 ^ ^ '^2 S3 ^ 

~ '^0 ^ '^0 — '^ 0 * 

We have that sq ^ Sq and Sq ^ sq, but sq 7 ^ Sq since Si is not bisimilar to any 
state in the second partial Kripke structure. 



4 A Model Checker for 3- Valued CTL 

We have so far focused on modal propositional logic because it is a simple context 
in which to present our ideas. However, this logic cannot express even simple sa- 
fety properties. In this section we present a 3-valued semantics for computation- 
tree logic (CTL) [CE81] as well as a model-checking algorithm. We consider CTL 
because it extends the expressiveness of modal propositional logic, the model- 
checking algorithm for the standard 2-valued interpretation is well known, and 
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because it is expressive enough to specify many interesting properties. 

Our algorithm is based on the algorithm of [CES 86 ]. We focus here, as in 
[CES 86 ], on formulas of the form A[fi Zi/2), since the model checking of other 
CTL formulas is either similar or much simpler. Eormula A[fi U /2) holds of a 
state in a Kripke structure if, along all paths from the state, there exists a state 
in the path for which /2 holds and for which /i holds of all previous states in 
the path. 

The semantics of CTL is given as an inductive definition of the satisfaction 
relation |= between a Kripke structure M = (S', L, 7 ^) and a CTL formula. The 
clause of the definition of |= for formula A{fi U f 2) reads 

So 1 = A{fi U f 2) iff for all paths (sq, si, . . .), 

> 0 : 1= /2 A Vj : 0 < j < i ^ Sj ^ /i 

(Here the Kripke structure M is understood.) To see if a state satisfies A{fi U f 2)^ 
the procedure au of the CTL model checker of [CES 86 ] works roughly as follows. 
It is called with a formula / of the form A{fi U f 2)^ a state sq, and a result va- 
riable 6 . It is assumed that when au is called a state is labeled with /i just if it 
satisfies /i, and similarly for /2. A depth-first search of the states reachable from 
So is then made, with states marked as they are visited. Initially, if so is labeled 
with /2, then So is labeled with A(/i U /2) and the procedure terminates with b 
set to true. Otherwise if So is not labeled with /i, then the procedure terminates 
with b set to false. Otherwise procedure au is called recursively on all successors 
of So- If a recursive call is made to a state that is already marked, the procedure 
terminates with b set to false. 

Eor 3 - valued CTL we define A{fiU f2) as follows: 

[so \=^{fi^h)] =min({[(so,si,...) \= I (so,si,...) a path}) 

[(so,Si, . . 1= A = max({[(so,Si, • • -) |= A A] 1^ > t»}) 

[(so,si, . . .) 1= A A] = min(mm({[si |= A] | * < ^}),{[s* 1= A]}) 

The min operators in this definition correspond to conjunction and universal 
quantification in the definition for 2 -valued CTL above, and similarly the max 
operator in this definition corresponds to existential quantification in the defini- 
tion above. The two definitions agree on complete Kripke structures. 

Consider the problem of adapting the procedure au of the CTL model checker 
of [CES 86 ] to the 3 -valued case. One way in which the algorithm becomes more 
complicated is in checking a state for the first time. Suppose our partial Kripke 
structure has only a single path, and that the value of formulas fi and /2 for 
the first three states on the path are as follows: 

So : (/i = true, f2 =-L), si : (/i = true, f'2 = false), S2 : (/i =_L, /2 = true) 

We see that /2 is _L at sq, but we cannot conclude immediately that A{fi U f2) 
is T at So because we may find (and do, in this example) that /2 is true at a 
later state. However, if fi were T or false at state Si, then we could conclude 
that A[fi U f 2) is T at so- 
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Determining the result when a cycle is detected also becomes more compli- 
cated. In the 2- valued algorithm we know that /i holds but /2 does not along 
all states in a cycle. In the 3-valued case, for states in a cycle it may be that /2 
is true and /2 is false^ or /2 is true and /2 is _L, or /2 is _L and /2 if false. 

Figure 1 shows our modified version of the procedure au of [CES86]. The 
idea behind the algorithm is to check the partial Kripke structure twice. First 
check the structure under a pessimistic interpretation, in which the value T is 
understood as false. If the result of the check is true then return true as the 
3-valued result. Then check the structure under an optimistic interpretation, in 
which the value T is understood as true. If the result of this check is false then 
return false. Otherwise return _L. 

Our algorithm merges these two checks so that they can be run at the same 
time. Procedure au now takes an additional argument mode C {p^o} and returns 
a pair If mode contains constant p (resp. o) then value Vp [vo) is the 

result of the pessimistic search. If mode does not contain p (resp. o) then Vp 
[vo) is false. When mode — {p^o} we interpret the returned pairs [false^ false) ^ 
{false^ true)^ and {true^ true) in a 3-valued sense as false^ T, and true^ respectively. 
Notice that an optimistic check must give true if a pessimistic one does, so result 
[true^ false) is impossible. 

In our algorithm states have distinct optimistic and pessimistic labelings. 
Function labeledfs^ f^i) returns true inst if state s has label i E {p^o} for for- 
mula /. Function addJabelJfs^ f^i) gives state s label i for formula /. Func- 
tion labelfs^ f^mode) returns a pair [vp^Vo) where Vi is false if i ^ mode and 
labeledfs^ f^i) otherwise. The operator V on pairs of truth values is defined by 
(x,y) V (w,tc) = (x V w,y V w). The order < on pairs of truth values is defined 
by [false^ false) < [false^ true) < [true^ false) < {true^ true). 

States have distinct optimistic and pessimistic markings as well. Function 
markedfs) returns the set of interpretations for which state s is marked. Function 
add_markfs^ A)^ where A C {p, o}, marks s with each interpretation in A, 

The proof of correctness of our model-checking algorithm is omitted in this 
extended abstract. Our modified procedure, like the original one, requires time 
0(card(S') + card(T^)), and thus the overall complexity of the resulting 3-valued 
CTL model-checking algorithm is still 0( length (0) x (card (S') + card (7^))). 

5 Applications 

We now discuss how to exploit the results of the previous sections in practice. 
Consider a (possibly infinite) complete state space modeled as a Kripke structure 
M = (S, L,7^). Imagine that this state space is so large that only part of it can 
be explored. We now present a simple construction to define a partial Kripke 
structure M' = [Sf LfJV) representing only the explored states and transitions 
of this state space. 

Let C S' be the set of explored states and 7^^; C 7^ be the set of explored 
transitions. For this application, we can assign the value _L to all the atomic 
propositions in each unexplored state s E S \ Se- Since unexplored states are 
indistinguishable with this model, a single state of is enough to model 
all of them. For every unexplored transition (s,7) E 1Z\ such that s G S^;, 




282 



G. Bruns and P. Godefroid 



1 procedure au(f,s,b,mode) 

2 begin 

3 parent_mode := mode; 

4 mode := mode - marked(s); 

5 add_mark(s,mode); 

6 temp_mode := mode; 

7 for all i in temp_mode do 

8 begin 

9 if labeled(s,f2,i) then 

10 begin adddabel_i(s,f,i); mode := mode - {i} end; 

11 else if -ilabeled(s,fl,i) then 

12 mode := mode - {i} 

13 end; 

14 if mode = 0 then 

15 begin b := label(s,f,parent_mode); return end; 

16 push(s,ST); 

17 min := {true^true); 

18 for all si G successors(s) do 

19 begin 

20 au(f,sl,bl,mode); 

21 if bl < min then min := bl; 

22 if min = (false^ false) then break 

23 end; 

24 pop(ST); 

25 b := min V label (s,f, par ent_mode - mode); 

26 adddabel(s,f,b); 

27 return 

28 end 



Fig. 1. Procedure au of model-checking algorithm for 3- valued GTL 



we add a transition (s, in to model that we do not know where this 
unexplored transition leads. To preserve our assumption that is total, we also 
assume there is a transition in However, this is the only outgoing W- 

transition of modeling that unexplored states cannot lead back to explored 
states. In summary, we have the following: 



S' = Se^ 

^ 1 _L it s = se 

TZ' = TZe U {(s, s±) \ s e Se and (s, t) elZ\ IZe} U {(s^, s^)} 



Let us assume that M has initial state sq, and that sq is explored and denoted 
by Sq in S' . It is easy to prove the following. 

Theorem 4. Let Kripke structure M = (S', L,7^) with initial state sq represent 
a complete state space^ and let M' = (S^,L^,7^^) he a partial Kripke structure 
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built from M by the construction above. Then 

So ^ So. 

Theorems 2 and 4 together guarantee that any formula (j) of 3- valued proposi- 
tional modal logic that evaluates to true or false on a partial state space defined 
with the construction above has the same truth value when evaluated on the 
corresponding complete state space. 

Example 2. Consider the three following partial Kripke structures with a single 
atomic proposition p, whose truth value is defined in each state as indicated in 
the figure. 

si s2 s3 



p=false 



The formula Aftrue Up) of 3-valued CTL has a different truth value in each 
of the top states of these partial Kripke structures: [si |= A{trueUp)] = true^ 
[s2 1= A[trueU p)] =_L, and [53 |= A[trueU p)] = false. 

An important application of our framework is thus to make it possible to cope 
with missing parts of the state space during model checking and still obtain a 
definite answer when this is possible. In the case of CTL properties, the algorithm 
of the previous section captures exactly when this is possible and how do it, for 
any CTL formula and any partial Kripke structure. 

Other possible applications for our framework include the evaluation of heu- 
ristics for guiding the search and pruning state spaces (one can determine which 
heuristics more often give definite answers for which properties), and the analy- 
sis of systems containing state variables whose values cannot be read (and hence 
are unknown) at some points during the execution of the system. 

6 An Action-Based Approach to Partial State Spaces 

In this section we revisit the main results of Section 3 in an action-based frame- 
work, where system behavior is modeled as a labeled transition system rather 
than a Kripke structure. Here the focus is how a system responds to events (or 
actions)^ which are modeled as transition labels, rather than on the propositions 
that hold of system states. 

To capture the incompleteness of a state space, a transition system can be 
labeled with a divergence predicate | [Mil8I]. If the divergence predicate holds 
for label a at state p then intuitively some of the a transitions from p in the 
full state space may be missing at p in the partial state space. Note that this 
predicate takes a value of either true or false for each state and label, while the 
atomic propositions of partial Kripke structures are 3-valued. 







284 



G. Bruns and P. Godefroid 



Definition 8. An extended transition system is a structure (S', | a G 

A}^ I) where S is a set of states^ A is a set of labels^ {A | a G A} is a family of 
transition relations^ and ^ S x A is a divergence relation. 

We write pAg if (p, g) G A. We write p ^ a A a) E ^ and say that p 
diverges for a. Also, we write p I a A not p ^ a and say that p converges for a. 

The degree to which a state space is complete is modeled here by the diver- 
gence preorder [Mil81,Wal90] (also known as the partial hisirnulation preorder) ^ 
which is a generalization of the simulation and bisimulation relations. If a pair 
(p, q) is in this relation, then q must be able to match every transition of p. 
Furthermore, if p is convergent for label a, then p must be able to match every 
transition of q. 

Definition 9. The divergence preorder □ is the greatest binary relation on sta- 
tes of an extended transition system such that p ^ q implies: 

— whenever p^p^ there exists a q' such that q-^q' and p^ □ qf and 

— if p i a^ then q i a and whenever g Ag^ there exists a p^ such that pAp^ 

and p^ ^ qf 

This preorder differs from the completeness preorder of Section 3 because 
the divergence predicate specifically captures the possibility that transitions are 
missing at a state, while in partial Kripke structures an atomic proposition with 
value T may not represent this possibility. 

Hennessy-Milner Logic (HML) [HM85] is a propositional modal logic for la- 
beled transition systems. Formulas of HML have the following abstract syntax: 



(p ::= tt I ->0 I 01 A 02 I (<^)0 



where a ranges over A. We use the standard propositional abbreviations, inclu- 
ding ff and V , plus the derived modal operator [a], defined by [a]0 = -i(a)-i0. 

We give the following 3- valued interpretation of HML formulas. The truth 
value of an HML formula 0 for a state p, written [p |= 0], is defined inductively 
as follows: 



[p 1= tt] 

[p 1 = -'fp] 
[p 1= (^1 A (l>2] 

[p 1 = {aX] 



true 

neg([p 1= 4>\) 

min([p 1= 4>X b 1= h\) 

{ max({[p^ 1= 0] I p Ap^}) if p I a 

max({[p^ 1= 0] I p Ap^} U {T}) otherwise 



HML, under our 3-valued interpretation, characterizes the divergence preor- 
der on extended transition systems. 

Theorem 5. Let (S', A, {A | a G A},|) be am extended transition system, such 
that the set {p' \ p Ap^j is finite for all p in S and a in A, Let p and q be states 
in S. Then 



: b 1= < [<i\=r\) i P E 9- 
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Our 3-valued interpretation of HML has a close connection to the intuitionistic 
interpretation by Plot kin. In [Sti87] Plot kin’s interpretation is presented and it 
is shown that the logic characterizes the divergence preorder. A positive form of 
HML is used there, with syntax 

(j) ::= tt I ff I A 02 I 01 V 02 I (<^)0 I N0 

Negation is not present, but the complementary forms of tt. A, and (a) are 
included. The intuitionistic semantics of this logic is like that of standard 2- 
valued HML, except for the [a] operator. The intuitionistic semantics of the two 
modal operators are: 

P \=l if : p-^p^ and p^ |=/ 0 

P \=i if p i a and \/p' : p-^ p' ^ p' \=j (f) 

These two operators are no longer duals, unlike the standard 2-valued inter- 
pretation and our 3- valued interpretation. For example, if process p has no a 
transitions and p ^ a then p ^ [a] ff and p ^ (a) tt. 

The precise connection between this interpretation and our 3-valued inter- 
pretation is as follows. We define the syntactic complement comp(0) of a positive 
HML formula 0 as follows: comp(tt) = ff , comp(ff) = tt, comp(0i A 02) = 
comp(0i) V comp(02), comp(0i V 02) = comp(0i) A comp(02), comp((a)0) = 
[a] comp (0), and comp([a]0) = (a) comp (0). Then our 3- valued interpretation 
gives the result T for p and 0 just if both 0 and comp(0) fail to hold for p. 

Theorem 6. Let (j) he a formula of positive HML and let p he a state of an 
extended transition system. Then the following all hold: 

1. [p \= f] = true i p \=i (j> 

2. [p \= f] = false i p \=i comp^f) 

3 . [p \= (j)] = 1. i p 0 and p comp^f) 

In [Sti87] the divergence preorder is characterized by intuitionistic HML as 
follows. Let p and g be processes in extended transition systems. Then p ^ q 
just if, for all 0 of positive HML, p \=i <j> ^ q \=i From Theorem 6 the 
equivalent condition in 3-valued HML is ([p |= 0] = true) ^ {[p \= f] = true). 
Clearly V0 : ([p |= 0] = true) ^ ([g |= 0] = true) is equivalent to the condition 
V0 : [p \= (j>] < 1 = used in Theorem 5 above. Thus, in this action-based 

framework, we could have defined 3-valued HML in terms of intuitionistic HML, 
and then derived our characterization result from the characterization result of 
[Sti87]. 

An advantage of 3-valued modal logic over intuitionistic modal logic is that it 
more naturally captures the problem of model-checking partial state spaces. For 
example, a 3-valued modal logic leads directly to a model checker that, given 
a state and a formula, returns true^ false^ or T. in contrast, a model checker 
directly based on intuitionistic modal logic would return either true or false. 
The value T could only be inferred from the results of multiple checks. 
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7 Conclusions 

We developed a simple framework for reasoning about partially-known behaviors 
of a system. We showed that the use of 3- valued temporal logics nicely models 
the absence of information about unknown parts of the state space of a system. 
We then precisely determined, both operationally and logically, the relationship 
between a partial state space and a more complete one. We also presented a 
model-checking algorithm for 3-valued CTL. This model checker can check any 
CTL formula on any partial state space, and returns either a definite answer of 
true or false concerning the full state space, or _L (“I don’t know”) if the partial 
state space lacks information needed for a definite answer. 

We also compared our results on partial Kripke structures with existing 
work on extended transition systems. In the latter framework, we showed that 
Hennessy-Milner Logic with our 3- valued interpretation provides an alternative 
characterization of the divergence preorder in addition to the intuitionistic in- 
terpretation of Plotkin. Further work on divergence preorders and logics to cha- 
racterize them can be found in [Sti87,Wal90]. Verification techniques based on 
the divergence preorder are described in [Wal90,CS90]. In all this work logical 
formulas are interpreted normally in the 2- valued sense. To our knowledge none 
of the work on 3-valued modal logics (e.g., [Seg67,Mor89,Fit92]) shows how these 
logics can be used to characterize relations like our completeness preorder. 

The model-checking framework developed in this paper could be extended 
so it can be performed “symbolically” following the ideas of [BCM+90]. This 
would require the use of data structures and algorithms for representing and 
manipulating 3-valued formulas, such as Ternary Decision Diagrams [Sas97]. 
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Abstract. We describe a set of remarkably simple algebraic laws gover- 
ning microarchitectural components. We apply these laws to incremen- 
tally transform a pipeline containing forwarding, branch speculation and 
hazard detection so that all pipeline stages and forwarding logic are re- 
moved. The resulting unpipelined machine is much closer to the reference 
architecture, and presumably easier to verify. 



1 Introduction 

Transformational laws are well known in digital hardware, and form the basis of 
logic simplification and minimization, and of many retiming algorithms. Tradi- 
tionally, these laws occur the gate level: de Morgan’s law being a classic example. 
In this paper, we examine whether corresponding transformational laws hold at 
the microarchitectural level. 

A priori, there is no reason to think that large microarchitectural components 
should satisfy any interesting algebraic laws, as they are constructed from thou- 
sands of individual gates. Boundary cases could easily remove any uniformity 
that has to exist for simple laws to be present. Yet we have found that when 
microarchitectural units are presented in a particular way, many powerful laws 
appear. Moreover, as we demonstrate in this paper, these laws hy themselves are 
powerful enough to allow us to show equivalence of pipelined and non-pipelined 
microarchitectures . 

We have used this algebraic approach to simplify a pipelined microarchitec- 
ture that uses forwarding, branch speculation and pipeline stalling for hazards. 
The resulting pipeline is very similar to the reference machine specification (i.e. 
no forwarding logic), while still retaining cycle-accurate behavior with the origi- 
nal implementation pipeline. The top-level transformation proof is simple enough 
to be carried out on paper, but we have mechanized enough of the theory in the 
Isabelle theorem prover [20] to have verified it semi-automatically, using Isa- 
belle’s powerful rewriting engine. 

Interestingly, both circuits and laws can be expressed diagrammatically. A 
paper proof (transformation using equivalence laws) proceeds as a series of micro- 
architecture block diagrams, each an incrementally transformed version of the 
last. The laws often have a geometric flavor to them, such as laws to swap two 
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components with each other, or laws to absorb one component into another. We 
find this diagrammatic approach an excellent way to communicate proofs. 

For us, the most time-consuming part of this technique has been discovering 
the local behavior-preserving laws. It is our experience that these laws are much 
easier to discover when one uses the right level of abstraction. In particular, 
we encapsulate all control and dataflow information concerning a given instruc- 
tion in the pipeline into an abstract data type called a transaction [1,W]* We 
have found that not only do transactions reduce the size of microarchitecture 
specifications, they also provide enough “auxiliary” state information to make 
law-discovery practical. 

The rest of the paper gives a brief introduction to our specification language, 
and then discusses many of the laws we have discovered. We then show their use 
by applying the laws in a proof of equivalence between two microarchitectures. 
While space constraints prohibit us from giving the complete proof, the top-level 
proof is sketched diagrammatically in [16]. 



2 Specifying a Pipelined Microarchitecture 

We specify microarchitectures using the Hawk language [4,17]. Hawk allows us to 
express modern microarchitectures clearly and concisely, to simulate the micro- 
architectures, either directly with concrete values, or symbolically, and provides 
a formal basis for reasoning about their behavior at source-code level. Currently 
Hawk is a set of libraries built on top of the pure functional language Haskell, 
which is strongly typed, supports first-class functions, and infinite data structu- 
res, such as streams [8,21]. It is this legacy that led us to look for transformation 
laws in the first place: one often-cited benefit of purely functional programs is 
that they are amenable to verification through equal ional reasoning. We wanted 
to see if such algebraic techniques scaled up to microarchitectural verification. 



2 . 1 Hawk S ignals 

Hawk is a purely declarative synchronous specification language, sharing a se- 
mantic base similar to Lustre[7]. The basic data structure underlying Hawk is 
the signal^ which can be thought of as an infinite sequence of values, one per 
clock cycle, and circuits are pure functions from input signals to output signals. 
The elements of a signal must belong to the same type. 

We use a notion of transactions to specify the immediate state of an entire 
instruction as it travels through the microprocessor [1]. A transaction is a re- 
cord with fields containing the instruction’s opcode, source register names and 
values, and the destination register name and its value, plus any additional in- 
formation, like the speculative branch target PC for each branching instruction. 
A microarchitecture is a network of components, each of which processes signals 
of transactions. 
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Figure 1 shows the diagram of a simple one-stage microarchitecture, built out 
of transaction signal processors. Each component incrementally assigns values to 
various transaction fields, based on the component’s internal state (if any) and 
the values of transaction fields assigned by earlier components. A textual Hawk 
specification of this circuit consists of set of mutually-recursive stream equations 
between the components. However, in this paper we will represent Hawk circuits 
as diagrams. 




For examp le , the 
regFile component has 
two transaction signal 
inputs and one transaction 
signal output. At a given 
clock cycle, the first input 
Fig. 1. One-stage pipeline. (called regFileIn in Fi- 

gure 1) contains a transaction whose opcode and register name fields have been 
initialized, but whose value fields have all been zeroed out. The second input 
(called writeback) contains the completed transaction from the previous clock 
cycle. The regFile component first updates its internal register file state, based 
on the destination register name and value fields of the writeback input. It 
then fills in the source operand value fields of the regFileIn transaction based 
on the corresponding operand register names and the updated register file, and 
outputs the filled in transaction, all within the same clock cycle. 



The alu component examines the opcode and source operand value fields of 
the transaction output by regFile. If the opcode is an ALU operation (which 
include branch instructions), the alu component computes the appropriate re- 
sult, assigns the result to the destination operand value field of the transaction, 
and outputs the transaction along the mem In wire, again within the same (long) 
clock cycle. If the opcode is not an ALU operation, the alu component outputs 
the transaction unchanged. 

The mem component behaves similarly for memory load and store operations. 
Like the regFile component, the mem component has internal state, representing 
the contents of data memory at each clock cycle. This state is updated and 
referenced based on the transactions sent to the mem component. Just as with 
the alu component, all memory and transaction updating occurs within the 
same clock cycle. The mem component sends the completed transaction to a delay 
component (represented in our diagrams as a shaded box), to make it available to 
the ICache and regFile components in the next clock cycle. These transactions 
also become the output of the entire microarchitecture, as is shown by the right- 
most arrow. The initial value output by the delay component is the default 
transaction nopTrans, which represents an “inert” transaction which behaves 
like a NOP instruction, but does not affect the ICache ’s program counter. 

The ICache component produces new transactions, based on the value of the 
current program counter and the contents of program memory (the instruction- 
set architectures we consider have separate address spaces for instructions and 
data). Both the current PC and the instruction memory contents are internal 








Elementary Microarchitecture Algebra 291 



to ICache. The ICache takes on its writeback input the completed transaction 
from the previous clock cycle. The ICache examines the transaction for branches 
that have been taken. When it finds such an instruction, it modifies its internal 
PC accordingly and starts fetching transactions from the branch target address. 
The ICache has as output a signal of transactions representing the newly-fetched 
instructions. Each transaction’s source and destination operand values are in- 
itialized to zero, since the ICache doesn’t know what values they should have. 
The other pipeline components will fill in these fields with their correct values. 
The ICache has a second input, called stall, which is a signal of Boolean values. 
On clock cycles where stall is asserted, the ICache will output the same tran- 
saction as it did on the previous clock cycle. In this simple microarchitecture, 
stall is always false. In more complex pipelines, the stall signal is typically 
asserted when the pipeline needs to stall due to a branch misprediction. 

For more complex pipelines, we also allow the ICache to perform branch 
prediction, based on an internal branch target buffer. When performing branch 
prediction, the ICache will also annotate branch instruction transactions with the 
predicted branch target PC. A branchnnisp component (not shown in Figure 1) 
can locally compare the predicted branch target with the actual branch target 
to determine if a branch misprediction has occurred. 



3 Microarchitecture Laws 



With any algebraic reasoning there 
need to be some ground rules. We 
take as fundamental the notion of re- 
ferential transparency or, in hardware 
terms, a circuit duplication law. Any 
circuit whose output is used in multi- 
ple places is equivalent to duplicating the circuit itself, and using each output 
once. This law is shown graphically in Figure 2. Because of the declarative na- 
ture of our specification language, every circuit satisfies this law. That is, it is 
impossible within Hawk for a specification of a component to cause hidden side- 
effects observable to any other component specification. In many specification 
languages this law does not hold universally. For example, duplicating a circuit 
that incremented a global variable on every clock cycle would cause the global 
variable to be incremented multiple times per clock period, breaking behavioral 
equivalence. Hawk circuits can still be stateful, but all stateful behavior must be 
local and/or expressed using feedback. 

The next few sections introduce many other laws, some of which are specific to 
particular combinations of components, while others are quite widely applicable. 
Each instantiation of a law needs to be proved with respect to the specification 
of the circuit components involved. We have found induction and bisimulation 
to be the most useful ways of proving the laws in this paper, expressed as proofs 
in Isabelle. 




Fig. 2. Universal circuit-duplication law 
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3.1 Delay Laws 



Fig. 3. feedback rotation law 



The delay circuit is a funda- 
mental building block of clocked 
circuits, especially when combined 
with feedback. A feedback vari- 
ant of the circuit duplication law 



shown in Figure 3, called the feedback rotation law, allows circuits to be split 
along feedback wires. This law is not universal, but it is valid for any circuit that 
does not contain zero-delay cycles (amongst others). Happily, all of the laws we 
discuss, including the feedback rotation law itself, preserve a well-formedness 
property: if a circuit contains no zero-delay cycles, then any transformed circuit 
will also have no zero-delay cycles. 

The time-invariance law (Fi- 
gure 4) is also nearly universal. A 
circuit is time- invariant if one can 
retime the circuit by removing the 
delays from all the inputs of the 
circuit and placing new delays on 
the circuit’s outputs. Any combi- 
natorial circuit that preserves de- 
but so are stateful circuits like the 




ye 

Ml- 



Fig. 4. time-invariance law. 

fault values is automatically time-invariant 
register file and memory cache. Interestingly, the ICache is not. 

We use the above laws extensively to remove pipeline stages. If a pipeline 
stage is time- invariant, then we can move the pipeline registers (represented as 
delay circuits) from before the pipeline stage to afterwards. If subsequent pi- 
peline stage are also time- invariant, then we can repeat the process, eventually 
moving all of the delay circuits to the end of the pipeline. However, forwarding 
logic between pipeline stages must still access the appropriate time-delayed out- 
puts of later pipeline stages. The feedback-rotation law polices this, and ensures 
that the appropriate time-delay is kept by forcing delays to be inserted on all 
feedback wires to the forwarding circuits. 



3.2 Bypsisses and Bypass Laws 

The purpose of forwarding logic in a pipeline is to ensure that results computed 
in later pipeline stages are available to earlier pipeline stages in time to be 
used. Conceptually, the forwarding logic at each pipeline stage examines its 
current instruction’s source operand register names to see if they match a later 
stage’s destination operand register name. For every matching source operand, 
the operand value is replaced with the result value computed by the later pipeline 
stage. Non-matching source operands continue to use operand values given by 
the preceding pipeline stage. 
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This conceptual logic can be implemented conci- 
sely using transactions. A bypass circuit (Figure 5) 
has two inputs, each a signal of transactions: The 
first input (inp) contains the transactions from 
the preceding pipeline stage. The second input 
(update) contains the transactions from a subse- 
quent pipeline stage. The bypass circuit at each 
clock cycle compares the source operand names of the current inp transaction 
with the destination operand names of the current update transaction. The out- 
put of bypass is identical to inp, except that source operands matching update’s 
destination operand are updated. Bypasses arise frequently enough in pipeline 
specifications that we draw them specially, as diamonds with the update input 
connected to either the top or the bottom. 

Bypass circuits have many nice 
properties. Not only are they time- 
invariant and obey a kind of idem- 
potence (Figure 6), but they also 

interact closely with register files ^ n .... . 

. . . . l^ig. 6. bypass circuit idempotence law 

and various execution units. 





Fig. 5. bypass circuit 



The fundamental interaction 
between a bypass and register file 
is shown in Figure 7. We call this 
the register-bypass law^ and it is 
used repeatedly in eliminating for- 
warding logic when simplifying pi- 
pelines. The law states that we can 
delay writing a value into the register file, so long as we also forward the value 
to be written, in case that register was being read on the same clock cycle. 

Initially we considered this law to be a theorem about register files, and 
accordingly we proved that it held for a number of different implementations. 
However, it is also tempting to view this law as an axiom of register files. In 
effect, by using the law repeatedly from right to left, we obtain a specification 
for how the register file must behave for any time prefix. 



regFile 




regFile 




1 















Fig. 7. register-bypass law 



Hazard - Bypass Law Another bypass law permits the removal of bypasses 
between execution units. It is often the case that after retiming all delay circuits 
to the end of a pipeline, two execution units in a pipeline (such as an ALU 
unit and a Load/Store unit) are connected with one-cycle feedback loops. Each 
bypass circuit is forwarding the outputs of an execution unit to the inputs of 
that same execution unit, one clock cycle later. 

If the upstream pipeline stages can guarantee that there is no hazard bet- 
ween successive transactions, then the double feedback is equivalent to the single 
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feedback circuit shown at the bottom of Figure 8. This (conditional) identity is 
called the hazard- bypass law. 



To be more concrete, suppose 
execl is the ALU and exec2 the 
memory cache. Then an ALU-mem 
hazard arises if a transaction which 
loads a register value from memory 
is immediately followed by an ALU 
operation which requires that regi- 
ster’s value. Under these circum- 
stances the two feedback loops 
would give different results. Un- 
der all other circumstances the two 
circuits are equivalent. We express 
Fig. 8. hazard-bypass law this conditional equivalence using 

the no_haz component. It is an ex- 
ample of a projection component and is discussed in the next section. 




3.3 Projection Laws 

Many laws, like the hazard-bypass law above, require that the input signals 
satisfy certain properties, and commonly, we may know that the output signal 
of a given component always satisfies a particular property. We can capture this 
knowledge of properties using signal projections, 

A signal projection is a component with one input and one output. As long 
as the input signal satisfies the property of interest, the component acts like an 
identity function, returning the input signal unchanged. However, if the input 
does not satisfy the property we are interested in, the projection component 
modifies the input signal in some arbitrary way so that the property is satisfied. 

Let us consider an example. For the hazard-bypass law we are interested in 
expressing the absence of ALU-mem hazards in a transaction signal. We reify 
this property as a no_haz projection. On each clock cycle, the no_haz component 
compares the current input transaction with the previous input transaction. If 
there is no ALU-mem hazard between the two transactions, then the current 
transaction is output unchanged. If a hazard does exist, then no_haz will instead 
output nopTrans, which is guaranteed not to generate a hazard (since nopTrans 
contains no source operands). 

Where do projections come from? After all, they are not the sort of compo- 
nent that microarchitectural designers introduce just for fun. 

Fig 9 provides an example of a law which “generates” a projection. The 
hazard-squashing logic guarantees that its output contains no hazards, and this 
is expressed in that the circuit is unchanged when the no_haz component is 
inserted on its output. 

(The hazard component outputs a Boolean on each clock cycle stating 
whether its two input transactions constitute a hazard. The kill component 
takes a transaction signal and a Boolean signal as inputs. On each clock cycle, if 
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the Boolean input is false, then kill outputs its input transaction unchanged. 
If the Boolean input is true, then kill outputs a nopTrans, effectively “killing” 
the input transaction.) 

To be useful, a pro- 
jection component needs 
to be able to migrate 
from a source circuit 
that produces it (such as 
the circuit in Figure 9) 
to a target circuit that 
needs the projection to 
enable an algebraic law (such as the hazard-bypass law). Thus a projection 
component must be able to commute with the intervening circuits between the 
source and the target circuit. Well-designed projections commute with many cir- 
cuits. For instance, the no_haz projection commutes with bypass, alu, mem, and 
regFile components. It also commutes with delay components (that is, no_haz 
is time-invariant). 

Projections are also convenient for expressing the fact that a component 
only uses some of the fields of an input transaction. For instance, the hazard 
component only looks at the opcode, source, and destination register name fields 
of its two input transactions. We can create a projection called proj_ctrl that 
sets every other field of a transaction to a default value, and prove a law stating 
that the hazard component is unchanged when proj_ctrl is added to any of 
its inputs. We can then show that proj_ctrl commutes with other components, 
such as bypasses and delays. This allows us to move the input wires to hazard 
across these other components, which is sometimes necessary to enable other 
laws. Similarly, the proj_branch_inf o projection allows us to move ICache and 
branchjnisp component inputs. 

4 Transforming the Microarchitecture 

The laws we have been discussing can be used for aggressively restructuring 
microarchitectures while retaining equivalence. We have used them to simplify 
several pipelined microarchitectures with a view to verification. The example 
we present here contains three levels of forwarding logic, resolves hazards by 
stalling the pipeline, and performs branch speculation. The block diagram for 
this microarchitecture is shown in Figure 10. 

By using just algebraic laws, we have been able to reduce most of the com- 
plexity, leaving essentially an unpipelined microarchitecture. We are currently 
implementing the algebraic laws as a rewrite system in Isabelle. For this paper 
we describe our top-level rewrite strategy informally. 








Fig. 9. Hazard-squashing logic guarantees no hazards 



Retiming We first remove all delay circuits from the main pipeline path. We 
accomplish this by repeatedly applying the time-invariance law, and by splitting 
delays along wires through the circuit duplication and feedback rotation laws. 
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Fig. 10. Microarchitecture before simplification 



Move control wires Next, we move all wires not directly involved with forwar- 
ding logic to either before or after all of the bypass circuits. This is to enable the 
hazard- bypass laws, which we apply in a later step. We move the wires by ins- 
erting projection circuits and using the corresponding projection-commutativity 
laws. 



Propagate hazard information The hazard-bypass laws can only be ap- 
plied when there are no hazards between the affected stages. So we generate a 
no-hazard projection at the end of the dispatch stage (which is justified by a 
projection-absorption law applicable to the kill-circuit complex in that stage), 
and then move it between the first and second bypass circuits. We also use addi- 
tional properties of the proj_ctrl, kill, and regFile circuits (discussed in [16]) 
to swap the hazard/kill complex with the register file, so that the register-bypass 
law can be used more readily in the next step of the simplification. The circuit 
in Figure 11 shows the micro architecture after this step has been completed. 
Notice that the ALU and memory units are now connected exactly as required 
for an application of the hazard-bypass law. 




Fig. 11. Microarchitecture after the “propagate hazard information” step 
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Remove for w£ir ding logic We can now apply the hazard-bypass law to remove 
the bypass circuit just prior to the memory unit. We eliminate the other two 
bypass circuits by applying the register-bypass law twice. 

Cleanup The pipeline has now been simplified as much as possible, except that 
there are still some extra delay components as well as several unnecessary pro- 
jection circuits. We merge delay components, then move the projection circuits 
back to their places of origin and remove them using the projection laws in the 
opposite direction. 




Fig. 12. Microarchitecture after simplification 



The final microarchitecture is shown in Figure 12. This circuit still outputs 
exactly the same transaction values, cycle-for-cycle, as the microarchitecture in 
Figure 10, but is considerably less complex. We can now apply conventional 
techniques to verify that this microarchitecture is a valid implementation of the 
ISA. 

5 Discussion 

5.1 Related Work 

Hawk is built on top of the pure functional language Haskell, where algebraic 
techniques for transforming functional programs are routinely used for equiva- 
lence checking and verification [2,3,13] and for compilation and optimization [5, 
12]. Much of our work can be seen as an extension of these ideas. Hawk itself is 
very similar in flavor to Lustre [6] except that in Lustre signals are accompanied 
by additional clock information. The Hawk specification style follows from the 
work of Johnson [9], O’ Donnell [18], and Sheeran[25]. 

We have also been influenced by the algebraic techniques used in the re- 
lational hardware-description language Ruby [24]. Sizeable Ruby circuits have 
been successfully derived and verified through algebraic manipulation [10,11]. 
What distinguishes our work is our focus on micro architectural units as objects 
of study in their own right. The Ruby research has emphasized circuits at the 
gate level. 

In terms of verification, our approach is most similar to two known techni- 
ques, called retiming [14,23,26] and unpipelining [15]. A circuit is retimed when 
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the delay components of the circuit are repositioned, while the functional compo- 
nents are left unchanged, effectively through repeated applications of the time- 
invariance law. Typically, circuits are retimed to reduce the clock cycle time. In 
contrast, we retime circuits as part of a simplification process. In fact, we often 
use the time invariance law to increase cycle time! 

Unpipelining [15] is a verification technique where a pipelined microar- 
chitecture, specified as a state machine, is incrementally transformed into a 
functionally-equivalent unpipelined microarchitecture. Unpipelining proceeds by 
repeatedly merging the last stage of a pipeline into the next to last stage, produ- 
cing a microarchitecture with one less stage on each iteration. On each iteration, 
the two microarchitectures are proven equivalent by induction over time. This 
is similar to our approach, except that we use transactions to encapsulate and 
reuse many of the verification steps, and we only need to prove the equivalence 
of the portion of the micro architecture being transformed, rather than the entire 
microarchitecture, on each iteration. On the other hand, Levitt and Olukotun’s 
implementation of unpipelining is much more automated than our work up to 
now. 

Transactions were a key concept in allowing us to discover and formulate 
many of the algebraic laws of microarchitectural components. Unsurprisingly, the 
usefulness of transactions has been noticed before. Aagaard and Leeser used tran- 
sactions to specify and verify hierarchical networks of pipelines [1], and Onder 
and Gupta have used a similar concept of instruction contexts as a core datatype 
in UPFAST, an imperative micro architecture simulation language [19]. Further, 
Sawada and Hunt use an extended form of transactions in their verification of 
a speculative out-of-order microarchitecture [22]. Each transaction records two 
snapshots of the entire ISA state, before and after the instruction is executed. 
In their work, however, transactions are not part of the microarchitecture itself, 
but are constructed separately for verification purposes. 

5.2 Next Steps in Microarchitecture Algebra 

As we have come to see it, the main principle of applying algebraic techniques 
to microarchitectures is to use geometric reasoning to move and absorb circuits, 
and to express that reasoning as local equalities whenever possible. Conditional 
equalities can be expressed using projections. 

Some care is required in the definition of basic components. We have striven 
to design the component circuits to satisfy as rich a variety of algebraic laws as 
possible, such as preserving default values, satisfying time-invariance, and so on. 
Sometimes we hit on the correct definitions immediately, but more commonly 
adapted the definitions over time admitting more and more laws. One example of 
this is in pipeline registers. Initially, we used conditional delays to act as pipeline 
registers, but since then have found it useful to separate clocked behavior from 
functional behavior, enabling the two dimensions to be manipulated separately. 

In some sense the components we now manipulate are not optimal in terms of 
transistor counts. In particular, many units receive and propagate information 
they are not interested in. However, much of this overhead can be removed 
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automatically through a similar set of rewrite laws built around more primitive 
components than those presented in this paper. We plan to write this up in a 
subsequent paper. 
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Abstract. In shared-memory multiprocessors sequential consistency offers a natural tradeoff bet- 
ween the flexibility afforded to the implementor and the complexity of the programmer’s view of 
the memory. Sequential consistency requires that some interleaving of the local temporal orders 
of read/write events at different processors be a trace of serial memory. We develop a systema- 
tic methodology for proving sequential consistency for memory systems with three parameters 
— number of processors, number of memory locations, and number of data values. From the defi- 
nition of sequential consistency it suffices to construct a non-interfering observer that watches and 
reorders read/write events so that a trace of serial memory is obtained. While in general such an 
observer must be unbounded even for fixed values of the parameters — checking sequential consi- 
stency is undecidable! — we show that for two paradigmatic protocol classes — lazy caching and 
snoopy cache coherence — there exist finite-state observers. In these cases, sequential consistency 
for fixed parameter values can thus be checked by language inclusion between finite automata. 

In order to reduce the arbitrary-parameter problem to the fixed-parameter problem, we deve- 
lop a novel framework for induction over the number of processors. Classical induction schemas, 
which are based on process invariants that are inductive with respect to an implementation preorder 
that preserves the temporal sequence of events, are inadequate for our purposes, because proving 
sequential consistency requires the reordering of events. Hence we introduce merge invariants, 
which permit certain reorderings of read/ write events. We show that under certain reasonable 
assumptions about the memory system, it is possible to conclude sequential consistency for any 
number of processors, memory locations, and data values by model checking two finite-state lem- 
mas about process and merge invariants: they involve two processors each accessing a maximum 
of three locations, where each location stores at most two data values. For both lazy caching 
and snoopy cache coherence we are able to discharge the two lemmas using the model checker 
MOCHA. 

1 Introduction 

Shared-memory multiprocessors are an important class of supercomputing systems. In 
recent years a number of such systems have been designed in both academia and industry. 
The design of a correct and efficient shared memory is one of the most difficult tasks in the 
design of such systems. The shared-memory interface is a contract between the designer 
and the programmer of the multiprocessor. In general, there is a tradeoff between the 
ease of programming and the flexibility of shared-memory semantics necessary for an 
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efficient implementation. Not surprisingly, a number of abstract shared-memory models 
have been developed. 

All abstract memory models can be understood in terms of the fundamental serial- 
memory model. A serial memory behaves as if there is a centralized memory that ser- 
vices read and write requests atomically such that a read to a location returns the latest 
value written to that location. Coherence^ requires that the global temporal order of 
events (reads and writes) at different processors be a trace of serial memory. Sequen- 
tial consistency [Lam79] ignores the global temporal order and requires only that some 
interleaving of the local temporal orders of events at different processors be a trace of 
serial memory. Although sequential consistency is a strictly weaker property than cohe- 
rence, the absence of a synchronizing global clock between the different processors in a 
multiprocessor makes a sequentially consistent memory indistinguishable from a serial 
memory. Compared to coherence, sequential consistency clearly offers more flexibility 
for an efficient implementation; yet, most real systems that claim to be sequentially 
consistent actually end up implementing coherence. In an effort to get more flexibility 
for implementation, memory models that relax local temporal order of events at each 
processor have been developed in recent years. This has been achieved at the cost of 
complicating the programmer’s interface. These memory models such as weak ordering, 
partial store ordering, total store ordering, and release consistency [AG96] relax the pro- 
cessor order of events in different ways and provide fence or synchronization operations 
across which sequentially consistent behavior is guaranteed. 

We focus on the verification of sequential consistency for two reasons. First, the 
interface provided by sequential consistency is clear, easy to understand, and widely 
believed to be the correct tradeoff between implementation flexibility and complexity 
of the programmer’s view of shared memory. In fact, there is a trend of thought [Hil98] 
that considers the performance gains achieved by relaxed semantics not worth the ad- 
ded complexity of the programmer’s interface and advocates sequential consistency as 
the shared-memory interface for future multiprocessors. Second, even relaxed memory 
models have fence operations across which sequentially consistent behavior should be 
observed. Hence, the techniques developed in this paper will be useful for their verifi- 
cation also. 

High-level descriptions of shared-memory systems are typically parameterized by 
the number n of processors, the number m of memory locations, and the number v of 
data values that can be written in a memory location. A parameterized memory systems 
consists of a central-control part C and a processor part P. Both C and P are functions 
that take values for rn and v and return a finite-state process. An instantiation of the 
system containing n processors, m memory locations, and v data values is constructed 
by composing C{m^v) with n copies of P{m^v). We would like to verify sequential 
consistency for all values of the parameters. However, sequential consistency is not a 
local property; correctness for m processors (locations, values) cannot be deduced by 
reasoning about individual processors (locations, values). The following observations 
about real shared-memory systems, which we assume in our modeling, are crucial for 
our results. We assume that the memory system is monotonic and symmetric with respect 

^ Implementors of cache-based shared-memory systems have used the notion of cache coherence 
for a long time but the definition of coherence as stated here was first given in [ABM93]. 
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to both the set of locations, and the set of data values. Monotonicity in locations means 
that every run of the system projected onto a subset of locations is a run of the system 
with just that subset of locations. Monotonicity in data values means that a sequence is 
a run of the system with some set of possible data values if and only if it is a run of the 
system with a larger set of data values. Symmetry in locations means that, if a is a run 
of the memory system, and Xi is a permutation on the set of locations, then Xi {a) is also 
a run of the memory system. Finally, symmetry in data values means that, if a is a run of 
the memory system, and A^; is any function from data values to data values, then A^;(a) 
is also a run of the memory system. 

Even for fixed values of the parameters, checking if a memory system is sequentially 
consistent is undecidable [AMP96]. The main reason for the problem being undecidable 
is that the specification of sequential consistency allows a processor to read the value 
at a location after an unbounded number of succeeding writes to that location by other 
processors. In real systems, finite resources such as buffers and queues bound the number 
of writes that can be pending. It is sufficient to construct a witness that observes the reads 
and writes occurring in the system (without interfering with it) and reorders them while 
preserving the order of events in each processor such that a trace of serial memory is 
obtained. We call such a witness an observer. If a finite-state observer exists, then it 
can be composed with a fixed-parameter instantiation of the memory system and the 
problem of deciding sequential consistency is reduced to a language-containment check 
between two finite-state automata which can be discharged by model checking. In the 
concrete examples we have looked at (see below), we have indeed seen that a finite-state 
observer exists for fixed values of the parameters. 

However, our goal is to verify sequential consistency for arbitrary values of the 
parameters. Towards this end, we first develop a novel inductive proof framework for 
proving sequential consistency for any number n of processors, given fixed m and v. 
Inductive proofs on parameterized systems [KM89] use an implementation preorder and 
show the existence of a process invariant such that the composition of the invariant with 
an additional process is smaller than the process invariant in the preorder. The preorders 
typically used — for instance, trace containment and simulation — preserve the temporal 
sequence of events. Since we check a sufficient condition for sequential consistency 
by the mechanism of an observer that reorders the read/write events of the processors 
in the system, preorders that preserve the temporal sequence of events do not suffice 
for our purpose. Our inductive proof strategy first determines a process invariant I\ of 
the memory system with respect to the trace-containment preorder to get a finite-state 
abstraction that can generate all sequences of observable actions for any number of 
processes. We then find a merge invariant I 2 such that (1) the single-process memory 
system containing I 2 is sequentially consistent, and ( 2 ) there is an observer that maps 
every run a of i 2 ||^ that can be produced in an environment of ii to a run of I 2 , 
such that the read/write events in are an interleaving of the read/write events of I 2 
and P in a, and the traces obtained from a and a' are identical. Given a run 7 of the 
memory system with n > 1 processors, we use the observer to create a run 7 ^ of the 
memory system with n — 1 processors, such that 7 and 7^ are identical when projected 
to the events of the first n — 2 processors, and the read/write events of the (n — l)-st 
processor in Y are an interleaving of the read/write events of the (n — l)-st and n-th 
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processors in 7. By doing this n times, we generate a run of the memory system with a 
single processor, which is sequentially consistent by the base case of the induction. 

The induction demonstrates sequential consistency for any number of processors, but 
given m and v. We would like sufficient conditions under which using fixed values for 
m and v lets us conclude sequential consistency for all m and v. To that end, we impose 
three requirements on the process and merge invariants. The first two requirements — 
symmetry and monotonicity on memory locations — are identical to the corresponding 
assumptions on the memory system. The third requirement is called location indepen- 
dence. A process is location independent if it has the property that a sequence of events 
is a run of the process with rn locations if the rn sequences obtained by projecting onto 
individual memory locations are runs of the process with a single location. We show 
that if the two invariants satisfy location symmetry, location monotonicity, and location 
independence, and the observer is location and data independent, then it suffices to do the 
induction for three memory locations and two data values. As a result, the correctness 
of the memory system can be proved by discharging two finite- state lemmas using a 
model checker — one that proves the correctness of the process invariant, and another 
that proves the correctness of the merge invariant. 

Our proof framework can be applied to a variety of protocols; in particular, all 
cache-coherence protocols described in [AB86] fall into its domain. We demonstrate 
the method by verifying two example protocols — lazy caching [ABM93] and a snoopy 
cache-coherence protocol [HP96]. The correctness of lazy caching has been establis- 
hed before by manual proofs [ABM93,Gra94,LLOR99]. The correctness of the snoopy 
cache-coherence protocol is argued informally in [HP96]. Finite-state observers exist 
for both these examples. In both cases, the proof of a parameterized system was reduced 
to finite-state lemmas in the way described above, and discharged by our model checker 
MOCHA [AHM+98]. Manual effort was required to construct the process and merge 
invariants, and the observer, and to verify that the assumptions on the memory system 
and the requirements on the invariants and observer are indeed satisfied. 

Related work. We use process induction for the verification of an abstract me- 
mory model. We list related work along two axes — work that verifies abstract memory 
models, and work that verifies systems with an arbitrary number of processes. [MS91, 
CGH+93,EM95] verify finite instantiations of parameterized memory systems using au- 
tomatic techniques. [PD95] automatically proves correctness for an arbitrary number of 
processors but is limited to coherence. [LD92,LLOR99,PD96,PSCH98] verify abstract 
shared-memory models for all values of parameters but the proofs are not automatic. 
[GMG91,Gra94,NGMG98] offer sufficient conditions for the satisfaction of sequential 
consistency that can then be checked on the memory system. [KM89] gives an inductive 
proof framework for proving the correctness of parameterized systems. [BCG89,WL89, 
ID96,GS97,EN98] verify parameterized systems but they are not concerned with the 
specific problem of verifying memory models. 
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2 Parameterized Memory Systems 

2.1 I/O-Processes 

We use I/O-processes that synchronize on observable actions to model memory systems. 
Formally, an I/O-process A is a 5-tuple {Priv[A), Ohs[A),S[A), Si{A),T{A)) with 
the following components: 

- A set Friv{A) of private actions and a set 06s(A) of observable actions, such that 
Priv[A) n Ohs[A) = 0 . The set Act[A) is the union of Priv[A) and Ohs[A). 
Private actions are outputs, whereas observable actions can be both inputs and out- 
puts. The set of extended actions n{A) is given by Priv{A) x {out} U Obs{A) x 
{in, out}. 

- A finite set 5'(A) of states. 

- A set Sj{A) C S'(A) of initial states. 

- A transition relation T{A) C S'(A) x n{A) x 5'(A) satisfying the property that 

for all s G and tt g Ohs{A) x {in}, there is a state P G such that 

(s,7T,s') G T{A). 

For all TT G P{A), the first component is denoted by Pirstiii) and the second component 
by Secondiii). A sequence of extended actions tti, 7T2, . . . , of A is a run if there 
exist states sq, si, S2, . . . , 3 ^ such that sq G Sj{A) and {si, 7 Ti, G T{A) for all 
0 < i < A:. The projection operators First and Second are extended to runs in the 
natural way. The set of all runs of the FO-process A is denoted by P{A). A run is 
closed if Second{TVi) = out for all actions in the run. For any set /? C Act{A), the 
restriction of the run a to /? is the subsequence obtained by considering the elements 
from j 3 X {in, out} in a, and is denoted by [cr]^. For any run a of FO-process A, the 
restriction of a to Ohs{A) is called a trace. We say that tr{a) is the trace obtained from 
the run a. The set of all traces of the FO-process A is denoted by P(A). 

Let Ai and A2 be two FO-processes. We say that A\ refines A2, denoted by Ai ^ 
A2, if (1) Obs{Ai) C Obs{A2), and (2) every trace of Ai is a trace of A2. The FO- 
processes Ai and A2 dxt compatible if {\) Priv{Ai)C\Act{A2) = 0 and (2) Priv{A2)C\ 
Act{Ai) = 0 . The composition A = A1HA2 of two compatible FO-processes A\ and 
A2 is the FO-process A such that 

- Priv{A) = Priv(Ai) U Priv{A2), and Obs{A) = Obs(Ai) U Obs{A2). 

- S{A) = S{Ai) X 5(^2), and S'j(A) = Sj{Ai) x 5/(^2). 

- (('^1: '^2), 7T, (^1,^2)) C T{A) iff one of the following three conditions holds: 

1. TV = {a, in) and for A: = 1, 2, if a G Act(Ak) then 
{sk,{a, m),tk) G T{Ak) otherwise Sk = tk. 

2. TT = (a, out) and {si, {a, out),ti) G T{Ai), and if a G Act{A2) then 
{s2, {a, tn),t2) G T{A2) otherwise ^2 = ^2- 

3. TT = (a, out) and {s2, {a, out),t2) G ^(^ 2 ), and if a G Act{Ai) then 
{si, {a,in),ti) G T{Ai) otherwise si = ti. 

Suppose that Ai and A2 are compatible FO-processes. A run a = tti, 7T2, . . . , of 
Ai can be closed by A2 if there is closed run of A1HA2 such that First{a) = 

Ftrst{[a']Act{A,))- 
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2.2 Parameterized Memory Systems 

A parameterized memory system M has three parameters — the number n of processors, 
the number m of memory locations, and the number v of data values. The parameterized 
memory system M is built from two parameterized I/O-processes C and P which have 
two parameters — the number m of memory locations, and the number v of data values. 
Intuitively, the I/O-process P represents a single processor in the system and C represents 
a central controller. The I/O-process M (n, m, v) is built from the I/O-processes C (m, v) 
and P(m, v) by composing C{m^v) and n copies of P(m, v). Given n > 0, m > 0, and 
i; > 0, the memory system M (n, m, i;) is an I/O-process that has processors numbered 
from 1 . . . n, memory locations numbered from 0 ... m — 1, and data values numbered 
from 0 ... i; — 1 . 

We now formally define a parameterized memory system. Let N be the set of all 
non-negative integers. For any A: > 0, let Nk denote the set of all non-negative integers 
less than k. A parameterized memory system is a pair (C, P) such that both C and P are 
functions that map N \ {0} x N \ {0} to I/O-processes such that for all m > 0 and > 0, 
we have that Priv{C{m^v)) = PrivNamesc x Nm, x U {-L}), Obs{C{m^v)) = 
ObsNamesc x Nm x {Nv U {-L}), Priv{P{rn^v)) = PrivNamesp x Nm x {Nv U 
{_L}), and Obs{P{m^v)) = ObsNam.es p x Nm x (N-l, U {-L}), where PrivNamesc, 
ObsNamesc, PrivNamesp, and Ob s Names p are finite sets that satisfy the following 
properties: 

1. PrivNamesc Li ObsNamesc = 0, and PrivNamesp fl ObsNarnes p = 0. 

2. PrivNamesc Li [ObsNarnes p U PrivNamesp x N) = 0, and ObsNamesc O 
[PrivNam.es p x N) = 0. 

3. Re PrivNamesp and W E PrivNamesp. 

The functions name, loc, and val are defined on Act[C[m, i;)) U Act[P[m, i;)), and 
extract respectively the first, second, and third components of the actions. Given some rn 
and V, let RdWr[m, v) be the union of the set of read actions { [R, j, k)\j < m and k < 
v} and the set of write actions {{W,j, k)\j < rn and k < v}. 

For all m and v and for all A: > 0, let Pk [rn, v) denote the I/O-process that is obtained 
from P[m, v) by renaming every private action a to action a', such that (1) name[N) is 
the pair [name[a),k),{2) loc[N) = /oc(a), and (3) i;a/(a^) = i;a/(a). A parameterized 
memory system defines a function that maps NxN\{0}xN \ {0} to I/O-processes 
as follows: 

M[Q,rn,v) =C[rn,v) 

M[nR l,rn,v) = M[n,m,v)\\Pnci[rn,v) 

For particularn, rn, v, we say that M [n, rn, v) is a memory system. Note that M [n, rn, v) 
is compatible with v), due to the renaming of private actions in Pnpi[m,v), 

and the conditions on the names of private and observable actions of C and P described 
above. The observable actions of M [n, rn, v) are the same for all n > 0. We define a 
function proc on the set of actions \^k mv POv[Pk[rn,v)) such that if a is a private 
action of Pk[rn, v), then proc[a) = k. 
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2.3 Sequential Consistency 

Let Memop{n^m^v) be the union of the sets k)\0 < i < n and j < 

rn and k < v} and i), 7 , A:) |0 < i < n and j < rn and k < v}. Thus 

Memop{n^ m, v) denotes the set of read and write operations of M (n, m, v). The fun- 
ctions name, loc, and val, which were originally defined on actions of F{m,v) and 
C(m, v), can be defined analogously on actions of M (n, m, v). Thus, the four func- 
tions name, loc, val, and proc are defined on all members of Memop{n, m, i;). We use 
Mernop to denote the set |J^ ^ ^ Mernop[n, rn, v). 

Let a = 7Ti , 7T2 , . . . , tta: be a sequence in Memop*, the set of finite sequences with 
elements from Memop. The abstraction of a, denoted by A(p-), is a labeled directed 
graph {V,E,L), where L is a finite set of vertices, E C V x V , and L is a function 
from V to Memop{n, m, v), such that (1) V = {1,2, ..., k}, (2) for all i G V, we have 
that L{i) = 7Vi, and (3) for all x,y e V, we have that (x,y) e E iff proc{L{x)) = 
proc{L{y ) ) and x < y. We observe that for every sequence a G Memop*, the abstraction 
A{a) is an acyclic graph. Thus, we can obtain total orderings of the vertices in A{a) that 
respect the dependencies specified by its edges. Since the edges form a partial order, 
several such total orders, which are called linearizations of a, may exist. Formally, a 
one-to-one mapping / : V ^ V is a total order of A{a) = {V, E, L) if for all x,y eV, 
whenever {x,y) e E we have that f~^{x) < f~^{y). If / is a total order of A{a), 
then the sequence L(/(l)), L(/(2)), . . . , L{f{\V\)) of actions in Memop{n, rn,v) is a 
linearization of a. 

We are interested in defining which sequences from Mernop"^ are serial. Intuitively, 
a sequence from Mernop"^ is serial if it can be produced by serial memory where each 
read from a location returns the value written by the last write to that location. We 
state this formally below. Let a = tti , 7T2 , . . . , be a sequence in Memop * . We define 
lastwritecr as a function that associates with each position i in a, the position j in a where 
the most recent write to the location loc{7Vi ) was done. Formally, lastwrite^j is a mapping 
from the from the set (1, 2, ..., A:} to (1, 2, ..., A:} U {_L} such that lastwritea{i) = j if 
there exists a j such that y < i, /oc(7t^) = loc{7Vj), name{7Tj) = (W,ni) for some ni, 
and there does not exist any with j < j' < i, name{7Vjf) = {W, n 2 ) for some ri 2 , 
and loc^TVjf) = loc^TVi); otherwise lastwriteo-{i) = -L. The sequence a is serial if for 
all i < A:, if lastwritea{i) ^ _L, then val^ivi) = 

For the following definitions, we extend the abstraction function A to operate on 
arbitrary sequences a by first restricting it to actions in Memop. Formally, for any a, we 
have that A{a) = A{[a] Memop ) • We extend A to operate on sequences of extended actions 
by operating it on the first component of each extended action. Formally, if a is a sequence 
of extended actions, then A{a) = A{First{a)). Let Em = ^ ^E{M{n,rn,v)). 

Then A is defined for all sequences in Em- 

Definition 1 (Observer). Let M be a parameterized memory system. A function Q 
from Em to Memop* is an observer /6>r the memory system M (n, rn, v) if for every run 
a G E{M{n,rn,v)), the sequence i7(a) is a linearization of a. The observer Q is a 
serializer /6>r M{n,rn,v) if for every run a G E(M (n,m,v)), the sequence i7(a) is 
serial. 
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Definition 2 (Sequential consistency [Lam79]). Let M be a parameterized memory 
system. The memory system M (n, rn, v) is sequentially consistent if it has a seriali- 
zer. The parameterized memory system M is sequentially consistent if M{n,m,v) is 
sequentially consistent for all n > 0, rn > 0, and i; > 0. 

2.4 Assumptions on Parameterized Memory Systems 

In order to reduce the proof of sequential consistency of the parameterized memory 
system to finite state model checking obligations, we make some assumptions about 
memory systems. We first state a few additional definitions. Let a be a run of the memory 
system M(n, m, i;). We denote by a\j the run a restricted to the jth memory location. 
Formally, we have a\j = [a]^, where /? = {a | a G Act[M (n, m, i;)) and loc[a) = j}. 
For j > 0, we denote by a< j the run a restricted to memory locations numbered less 
than j; that is, a|<j = [a]^, where /? = {a | a G Act(M(n, m, i;)) and loc[a) < j}. 

Assumption 1 (Location symmetry) Let A : ^ be a permutation function on 

the set of memory locations. Extend A to actions, extended actions and extended action 
sequences in the natural way. Then, 

1. for all a ^ U{CJ[m,v)), we have that A(a) G U{CJ[m,v)), and 

2. for all a ^ U[F{m,v)), we have that \{a) G U{F{rn,v)). 

Assumption 2 (Location monotonicity) 

L If a G S{CJ[m^v)), then for all j < m, we have a\^j G S{C{j,v)). 

2. If a G S[F[m^v)), then for all j < ra, we have a\<^j G S[F[j,v)). 

Assumption 3 (Data symmetry) Let A :N^;U{_L}^Nt;U{_L}Z7^ any function on 
the set of data values, such that \[x) = L iff x = L. Extend A to actions, extended 
actions and extended action sequences in the natural way. Then, 

1. for all a G F(C(m^v)), we have that A(a) G t?)), and 

2. for all a ^ F[F{m,v)), we have that \{a) G F{F{rn,v)). 

Assumption 4 (Data monotonicity) Eor all rn, n, vi, V 2 , ifvi < V 2 , then 

1. for all a G Act(C(m, we have a G 2J(C(m, vi)) iff a G L\C{rn, V 2 )), and 

2. for all a G Act{F{m,vi))*, we have a G F{F{m,vi)) iff a G F{F{m,V 2 )). 

Note that the function A in assumption 1 above is a permutation on the set of locations, 
whereas the function A in assumption 3 could be any arbitrary function on the set of data 
values. Let Aq be the function from Act{M (n, m, i;)) to Act{M (m, n, i;)), which chan- 
ges the location attribute to 0. Formally, Ao(a) = a' such that name(a^) = name(a), 
loc{o!) = 0, and voffof) = val{a). We extend Aq to extended action sequences in 
the natural way. The observer i? is location independent if for all j, we have that 
i7(Ao(a|j)) = Ao(i7(a)|j). The observer Q is data independent if for every func- 
tion X : Ny U {_L} ^ Ny U {_L} such that A(x) = _L iff x = _L, we have that 
i7(A(a)) = A(i7(a)). 
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Proposition 1. Suppose the parameterized memory system M satisfies assumptions I- 
4. For all n > 0, the following two statements are equivalent: 

1. There is a location and data independent serializer for M (n, n, 2). 

2. There is a location and data independent serializer for M (n, m, v) for all m > 0 
and V > 0. 

Suppose we fix the number of processors to n. Due to the above proposition it suffices 
to consider only n locations and 2 data values, if the serializer we design is location 
and data independent. Since our objective is to prove sequential consistency for an 
arbitrary number of processors, we give a method based on induction over the number 
of processors for this. The inductive step in the method considers two processors and 
designs a serializer-like function for them. Then an argument similar to the one used in 
proving Proposition 1 will let us show that it is enough to perform the inductive step for 
fixed numbers of memory locations and data values. 

3 Reducing Sequential Consistency to Finite-State Proof 
Obligations 

3.1 Induction on the Set of Processors 

We show how to check sequential consistency of M (n, m, v) for all n > 0 by induction 
over the number of processors. We do not need any of the assumptions 1-4 for the results 
in this section. 

We note that every trace of r{M{n,m,v)) can be obtained by a run in M{n -\- 
1, m, t?) in which the (n + l)-st processor does not perform any output action. Hence 
r{M (n, m, i;)) is contained in T(M (n + 1, m, i;)) for all n. We would like to analyze 
a processor in an environment consisting of an arbitrary number of processors. Hence, 
we would like an upper bound on the trace set r(M(n, m, t?)) for all n. A sufficient 
condition for this upper bound is captured by process invariants [KM89]. A function 
1 1 with two arguments m and t? is a possible process invariant for the parameterized 
memory system M if for all m and v, we have that /i (m, v) is a I/O-process such that that 
(1) Obs{Ii{m^ v)) = Obs{M (n, m, i;)) for all n > 0 (recall that the set of observable 
actions of M (n, m, v) is the same for all n > 0), and (2) ii(m, v) is compatible with 
P(m, v). 

Definition 3 (Process invariant). Let Ii be a possible process invariant for the para- 
meterized memory system M. The function is a process invariant ofM if the following 

condition is true for all m and v: 

[A/^(m,i;)] 1. C{rn^v) li{rn^v) 

2. li{rn^v)\\F{rn^v) li{rn^v) 



Proposition 2. Suppose Ii is a process invariant of the parameterized memory system 
M. Then, for alln> 0, m> 0, and v > 0, we have that M (n, rn^ v) ^ Ii{rn^v). 

If the parameterized memory system M is sequentially consistent, then by our definition, 
there exists an observer i? for M such that for every sequence a of memory operations of 
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M (n, m, v), the function i? produces a rearranged sequence a' such that (1) a' is serial, 
and (2) a and agree on the ordering of the memory operations of each individual 
processor. We wish to provide an inductive construction that produces such an observer 
for arbitrary n. The construction uses the notion of a generalized processor called a 
merge invariant, and a witness function that works like an observer for a two-processor 
system consisting of the merge invariant and /^(m, v). 

Recall that RdWr{m,v) is the set of private actions of F{m,v) that represent 
read and write operations. For technical reasons, we want the memory operations of 
the merge invariant to be named differently than those of Let Rd\m^v) = 

k) \j < rn and k < v}, and let Wr\m^ v) = {(VL^ k) \j < rn and k < v}. 
Let RdWr\m^ v) denote the union of Rd\m^ v) and Wr\m^ v). We define the func- 
tion prime on RdWr{rn^v)\yy prime[{R^j^k)) = {R^^j^k) and pnme((iy, A:)) = 
A:). We define the function unprime on RdWr\m^ v) by unprime {{R' ^ j ^ ^)) = 
{R^j^ k) and unprime[{W' k)) = (fL, k). We extend prime and unprime to se- 
quences of actions in the natural way. We say that the sequence G Rd Wr'{m, v)* 

rearranges the sequence a G {RdWr{m^ v) U RdWr\m^ r))* if is an interleaving 

of [cj] ) ^nd 

A function I 2 with two integer arguments m and r is a possible merge invariant 
for the parameterized memory system M if for all m and r, we have that l2{m^v) 
is an I/O-process such that (1) Obs{l 2 {m^v)) = Obs{F{m^v)), (2) RdWr\m^v) C 
Friv{l 2 {m^v)) and RdWr{m^v) Pi Friv{l 2 {m^v)) = 0, and (3) / 2 (m,r) is com- 
patible with both F{m,v) and C{m,v). Let ii be a process invariant of M and I 2 
be a possible merge invariant of M. A function G from |J^ ^ 1 ^)) 

to m,r)) is a merging function if a G 27 (/ 2 (m, r) ||P(m, r)) implies 

0{a) G U{l2{m,v)). 

Definition 4 (Merge invariant). Let li be a process invariant and let I 2 be a possible 
merge invariant for the parameterized memory system M. The function 1 2 is a merge 
invariant of M with respect to 1 1 if there exists a merging function G such that the 
following two conditions are true for all rn and v: 

r)] For every closed run a of l 2 {m,v)\\C{m,v), the sequence 
unprime{[a]jidWr'{m,v)) is serial. 

[52/2,/i,0(m,r)] For every run a of 1 2 {m^ r)||P(m, v) that can be closed by /i(m,r), 
have that G{(j) rearranges a, and [(j]ohs{i 2 {m,v)) = [^{o-)]obs{i 2 {m,v)y 

Note that the I/O-process I 2 (m, v)\\C{m,v) is a single-processor memory system. We 
say that the merging function 6> is a witness for v) if G makes condition 

B2j^j^^o{rn^v) true. 

Let ii be a process invariant of M. Suppose i 2 is a possible merge invariant of 
M, and 6> is a merging function such that and are true 

for some m and v. For some n > 0, consider the process / 2 (m, i;)||M(n, m,i;), which 
can be written as l 2 {m,v)\\Fn{m,v)\\M{n — Consider any closed run a of 

i 2 (^, (n, m, v). Clearly there is a run a' of i 2(^7 that is closed by 

a run of M (n — 1 , m, t?) to produce a. Since ii is a process invariant of M , we have that 
a' is closed by a run of Ii (m, v). Therefore, using G we can rearrange a' to obtain a run 
G(a') of I 2 (m, v) which is closed by a run of M (n — 1 , m, i;). Thus we have managed 
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to rearrange a closed run of /2 (m, v)\\M{n, m, v) into a closed run of him, t?) ||M(n — 
1, m, t?). By repeating this procedure we eventually obtain a run of l 2 im,v)\\Cim,v), 
which is sequentially consistent by condition (m, v) . Since every run of M (n, m, v) 

is also run of / 2 (m, t?)| |M(n, m, t?), it follows that n applications of G effectively 
produce an observer i? which is a serializer for M(n, m, i;). The existence of such an 
observer implies the sequential consistency of M (n, m, i;). 

Theorem 1. Let M be a parameterized memory system. If h is a process invariant of 
M and h ci merge invariant of M with respect to h, then M is sequentially consistent. 

Suppose that we manage to come up with possible invariants h and h^ and a mer- 
ging function G. How do we verify for all m and v that (m, v), Bli^ (m, v), and 
hold? In the following two sections, we describe sufficient conditions 
whereby proving these obligations for fixed values of m and v will let us conclude that 
they hold for all m and v. 

3.2 Reduction to a Fixed Number of Memory Locations 

In this section, we use assumptions 1 and 2 on the parameterized memory system. Further, 
we impose requirements on the process and merge invariants and the merging function 
that will reduce the verification problem to one on a fixed number of memory locations. 
The first two requirements are identical to assumptions 1 and 2 on the parameterized 
memory system. 

Requirement 1 (Location symmetry) Let A : ^ be a permutation function 

on the set of memory locations. Extend A to actions, extended actions and extended 
action sequences in the natural way. We require for the possible process invariant h 
and the possible merge invariant I 2 that 

1. for all a G 27(/i(m, t?)), we have that A(a) G 27(/i(m, t?)), and 

2. for all a G B{l 2 {m, v)), we have that \{a) G B{l 2 {m,v)). 

Requirement 2 (Location monotonicity) We require for the possible process invariant 
1 1 and the possible merge invariant I 2 that 

1. if a G S[Ii{m,v)) then for all j < m, we have a\^j G B{Ii{j,v)), and 

2. if a G S[l 2 {m,v)) then for all j < m, we have a\^j G S{l 2 {j,v)). 

For any run a of h{m, v), we define tr'{a) as the restriction of a to Obs{l 2 {m, v)) U 
RdWr\rn, v). Let r^{h{m, v)) be the set {tB{a)\a G L\h{m, Recall that Aq 
is a function that changes the location attribute of an action to 0. 

Requirement 3 (Location independence) We require for the possible process invari- 
ant h and the possible merge invariant h that 

Lae r{Ii{rn, v)) if a G Act[li{m, t?))*, and for all 0 < j < m, we have Ao(a|j) G 
r{Ii{l,v)), and 

2. a e r\l 2 (m,v)) if a G Act[l 2 {m,v)y, and for all 0 < j < m, we have 
XoH)er'{iZhv)). 
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Consider a merging function O. We say that O is location independent if whenever 

a G U{l 2 {m, v)\\F{m, v)), then 6>(Ao(a|j)) = Ao(6>(a)|^) for all j < m. 

Theorem 2. Let M be a parameterized memory system satisfying assumptions 1 and 2. 
Let 1 1 be a possible process invariant and let fy be a possible merge invariant for M 
satisfying requirements 1-3. Then the following conditions hold for all v > 0; 

1. A[^{l^v) is true iff Ai^ (m, v) is true for all m > 0. 

2. 51/2 (I7 IffBlr^ (m, v) is true for all m > 0. 

3. There is a location-independent witness O satisfying B2i^j^^o{l^v) for I < "3 iff 

there is a location-independent witness satisfying B2j^ /^ ,0^ v) for all m > 0. 

The condition / < 3 in the last item of the above theorem comes from the fact that a 
witness G needs to preserve three orderings while rearranging a run of i 2 {rn,v)\\F{rn,v) 
— (1) the order of memory operations in /2(m, /;), (2) the order of memory operations 
in 5(m, v), and (3) the order of observable actions in fy (m, v)\\F{m^ v).lfG does not 
preserve these orderings, and if G is location independent, we can prove that there exists a 
run a of /2 (3, t?) 1 1 5(3, t?) such that either G{a) does not rearrange a, or [cr]obs{i 2 A,v)) ^ 
\G(u)\ohs{i2A,v))‘ 

3.3 Reduction to a Fixed Number of Data Values 

In this section, we assume that the memory system satisfies assumptions 3 and 4. Recall 
the definition of a data-independent observer 

Theorem 3. Let M be a parameterized memory system satisfying assumptions 3 and 

4. For all n >{), m > 0, and v > f), if LI is a data-independent observer for the 
memory system M (n, rn, v), then fl is a serializer for M (n, rn, 2) iff fl is a serializer 
for M (n, rn, v). 

Consider a merging function G. We say that G is data independent if for all v, and 
for every function A : U {_L} ^ U {_L} such that A(x) = _L iff x = _L, we 

have that 6>(A(a)) = A(6>(a)). Suppose that the witness for B2j^j^^o{m,v) is data 
independent. Then the implicit observer function that is produced for M (n, m, v) as a 
result of n applications of the witness is also data independent. 

Corollary 1. Let M be a parameterized memory system satisfying assumptions 1-4. 
Let li be a possible process invariant and let I 2 be a possible merge invariant for M 
satisfying requirements 1-3. Let G be a location and data independent merging function. 
Suppose Aj^ (1,2) and 51/2 (1, 2) ^ B a witness for 52/2, (3, 2). Then 

M (n, rn, v) is sequentially consistent for all n > f), m > 0, and v > 0; that is, M is 
sequentially consistent. 

4 Two Applications: Lazy Caching and Snoopy Coherence 

We show how the theory developed in the previous section can be used to verify sequen- 
tial consistency of memory systems with an arbitrary number of processors, locations 
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and data values using a model checker. We consider two specific memory protocols, na- 
mely the lazy caching protocol from [ABM93] and a snoopy cache-coherence protocol 
from [HP96]. 

For each of these protocols, we first argue that assumptions 1-4 are satisfied by 
the memory system, and that requirements 1-3 are satisfied by the process and merge 
invariants. Then, we design a witness G and argue that it is location and data independent. 
The following observations provide justification for our informal arguments: 

- The invariants and the witness have the property that they never base their decisions 
on data values. Thus, they are data independent by design. 

- The memory system inherently enforces a total order on the writes to every location. 
In fact, every memory system we know of has this property. Our merge witness 
respects this total order for every location. Let M be a parameterized memory 
system and let 6> be a merging function. Let a be a run of M and let j be any 
location. The order of writes in G{a)\j is the same as the total order of writes to 
location j in a. Every read to a location reads the value written to that location 
by some earlier write. The witness also respects this causal relationship between 
the writes and the reads. If two reads of location j access the value written by the 
same write, then the witness places them in their temporal order of occurrence in 
G{a) \j. Thus, the ordering of events to a location j is independent of the events to 
other memory locations and determined solely by the temporal sequence of events 
to location j. Hence, our witness is naturally location independent. 

We finally discharge the three proof obligations of Corollary I using our model checker 
MOCHA. 

Lazy Caching. The lazy caching protocol allows a processor to complete a write in its 
local cache and proceed even while other processors continue to access the “old” value 
of the data in their local caches. Each cache has an output queue in which writes are 
buffered and an input queue in which reads are buffered. In order to satisfy sequential 
consistency, some restrictions are placed on read accesses to a cache when either writes 
or updates are buffered. A complete description of the protocol can be found in [ABM93] . 

The FO-process C(m, v) for this protocol is the trivial process with a single state 
that accepts all input actions. The FO-process F{m, v) is a description of one processor 
and cache in the system. The set Priv{P{m^ v)) has actions with three different names: 
read, write, and update. An update action occurs when a write gets updated to a local 
cache from the head of its input queue. There is one action for each combination of these 
names with locations, processors and data values — a total ofSxnxmxt? private 
actions. The set Obs{P{m,v)) has actions with one name: serialize. A serialize action 
occurs when a processor takes the result of a local write from the head of its output 
queue and transmits it on the bus. The serialize action does not identify the processor 
which did the action. Thus, a processor has m x v different observable actions. 

The process invariant /i is such that for all m and v, the FO-process simply 

generates all possible sequences of serialize actions. It is trivial to see that ii is a process 
invariant. The merge invariant p exactly the same as P. The merging function G is 
non-trivial. It queues write actions and delays them until the corresponding update action 
is seen by all processors. It also delays read actions until the corresponding write has 
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been made visible. The witness preserves processor order, never bases decisions on data 
values, and respects the total order of writes that is inherent to the lazy-caching protocol. 
By design, the witness is location and data independent. 

We used MOCHA [AHM+98] to verify that the merging function 6> is a witness for 
the merge invariant for three locations and two data values. This obligation had about 60 
latches and required MOCHA about 4 hours to check on a 625 MHz DEC Alpha 21164. 

Snoopy Cache Coherence. The snoopy coherence protocol has a bus on which all 
caches send messages, as well as “snoop” and react to messages. Each location has a 
state machine which is in one of three states: read-shared, write-exclusive, or invalid. 
If a location is in read-shared state, then a cache has permission to read the value. 
If a location is in write-exclusive state, then a cache has permission to both read and 
write the value. In order to transition to read-shared or write-exclusive states, the cache 
sends messages over the bus, and other caches respond to these messages. There is also 
a central memory attached to the bus. When a location is not held in write-exclusive by 
any cache, the memory owns that location and responds to read requests for that location. 

The I/O-process C(m, i;) for this protocol models the central memory, and F{m^v) 
models one processor with a local cache. The process C(m, v) has no private actions. 
It has observable actions with four different names: read-request, write-request, read- 
response, and write-response. The process F{m, v) has private actions with two different 
names: read and write, and the same set of observable actions as C{rn, v). None of the 
observable actions identify the processor that did the action. 

The process invariant is such that for all rn and v, we have that ii(m, v) is a gene- 
ralization of the processor F{m, v). The processor is generalized so that it can send a 
read-request for a location even if it already has the location in read-shared, and a write- 
request even if it already has the location in write-exclusive. The merge invariant l 2 is 
identical to h with the additional capability to execute private read and write actions. 
The merging function O preserves temporal order of occurrence of reads and writes. 
This simple witness works because the snoopy protocol implements coherence. Again, 
by design the witness is data and location independent. We used MOCHA to verify that 
Ii{l,v) is a process invariant and ii(3, v) is a merge invariant. MOCHA required less 
than 15 minutes to check these. 
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Abstract. In this paper we present an efficient technique for symbolic 
model checking of any CTL formula with respect to a state/event system. 
Such a system is a concurrent version of a Mealy machine and is used 
to describe embedded reactive systems. The technique uses compositio- 
nality to find increasingly better upper and lower bounds of the solution 
to a CTL formula until an exact answer is found. Experiments show 
this approach to succeed on examples larger than the standard back- 
wards traversal can handle, and even in many cases where both methods 
succeed it is shown to be faster. 



1 Introduction 

The range of systems that can be formally verified has improved drastically 
since the introduction of symbolic model checking [7,8] with the use of reduced 
and ordered binary decision diagrams (ROBDD) in the eighties [3,2]. Since then 
many people have improved on the basic algorithms by introducing more efficient 
techniques, more compact representations and new methods for simplifying the 
models. 

One way to do simplifications on the model is by abstraction, where sub- 
components of the system considered are removed to yield a smaller and simpler 
model on which the verification can be done. The technique described here is 
based on one such incremental abstraction of the system, where first an initially 
small subset of the system is used as an abstraction. If this set is not enough to 
prove or disprove the requirements then the set is incrementally extended until 
a point where it is possible to give an exact positive or negative answer. 

The experimental results are promising and show that the iterative approach 
can be used with success on larger systems than the usual backwards traversal 
can handle, and it is still faster even when the usual method is feasible. 

This work is a direct extension of the work presented at TAG AS ’98 [11]. Now 
the technique covers full CTL model checking and calculates simultaneously both 
an upper and a lower approximation to the solution of the CTL formula. 

We apply this technique to the state/event model used in the commercial 
tool visualSTATE™ [13]. This model is a concurrent version of Mealy machines, 
that is, it consists of a fixed number of concurrent finite state machines that 
have pairs of input events and output actions associated with the transitions of 
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the machines. The model is synchronous: each input event is reacted upon by 
all machines in lock-step; the total output is the multi-set union of the output 
actions of the individual machines. Further synchronization between the machi- 
nes is achieved by associating a guard with the transitions. Guards are Boolean 
combinations of conditions on the local states of the other machines. 

Both the state space and the transition relation of the state/event system 
is represented using ROBDDs with a partitioned transition relation, exactly as 
described in [11]. 

1.1 Related Work 

Another similar technique for exploiting the structure of the system to be verified 
is described in [9] . This technique also computes increasingly better approxima- 
tions to the exact solution of a CTL formula, but it differs from our approach 
in several ways. Instead of reusing the previously found states as shown in sec- 
tion 6, this technique has to recalculate the approximation from scratch each 
time a new component is added to the system. It may also have to include all 
components in the system, whereas we restrict our inclusion to only the depen- 
dency closure (cone of influence) of the formula used. Finally the technique is 
restricted to ACTL(ECTL) formulae, whereas our technique can be used with 
any CTL formula. 

Pardo and Hachtel describes an abstraction technique in [14] where the ap- 
proximations are done using ROBDD reduction techniques. This technique is 
based on the /x-calculus (and so includes CTL). It utilizes the structure of a 
given formula to And appropriate abstractions, whereas our technique depends 
on the structure of the model. 

The technique for showing language containment of L-processes described 
in [1], also maintains a subset of processes used in the verification and analyzes 
error traces from the verifier to And new processes in order to extend this set. 
Although the overall goal of finding the result through an abstract, simplified 
model is the same as our, the properties verified are different and the L-processes 
have properties quite different from ours. 

Abstraction as in [5] is similar to ours when the full dependency closure 
is included from the beginning, and thus it discards the idea of an iterative 
approach. 

The idea of abstraction and compositionality is explored in more detail in 
David Longs’ thesis [12]. 

2 State/Event Systems 

A state/event system S consists of n machines Mi,... ,M^, an alphabet E 
of input events and an alphabet O of outputs. Each machine Mi is a triple 
{Si^ s^^Ti) of local states an initial state G and a set of transitions T^. 
The set of transitions is a relation 

Ti C Si X E X Gi X M(0) X Si, 
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Fig. 1 . Two state/event machines and the corresponding parallel combination. The 
small arrows indicate the initial states. 



where Ad{0) is a multi-set of outputs, and Gi is the set of guards not containing 
references to machine i. These guards are generated from the following simple 
grammar for Boolean expressions: 

g ::= Ij ^ p\^g \ g A g \ tt . 

The atomic predicate Ij = p is read as “machine j is at local state p” and 
tt denotes a true guard. The global state set S of the state/event system is the 
product of the local state sets: S = SiX S2X • • • x The guards are interpreted 
straightforwardly over S as given by a satisfaction relation s |= g. The expression 
Ij = p holds for any s E S exactly when the jTh component of s is p, i.e., Sj = p. 
The interpretation of the other cases is as usual. The transition relation is total, 
by assuming that the system stays in its current state when no transitions are 
enabled for a given input event. 

Considering a global state s, all guards in the transition relation can be 
evaluated. We define a version of the transition relation in which the guards 

have been evaluated. This relation is denoted s >i s[ expressing that machine 

i when receiving input event e makes a transition from Si to and generates 
output o (here Si is the iTh component of s). Formally, 

s def 3^. ( , e, o, ) G li and s g . 

The transition relation of the whole system is defined as: 

s <=>def Vi. s —^i s[ where , , , , and o = oi i±) . . . i±) 

Where i±) denotes multi set union. The example in figure 1 shows a system with 
two state/event machines and the corresponding parallel combination. Machine 
M2 starts in state go and goes to state qi on the receipt of event ei. Machine Mi 
can not move on ei because of the guard. After this M2 may go back on event 
62 or Ml may enter state pi on event ei. At last both Mi and M2 can return to 
their initial states on event 62 . 

3 CTL Specifications 

CTL [ 6 ] is a temporal logic used to specify formal requirements to finite state 
machines like the state/event systems presented here. Such specifications consist 
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of the Boolean constant true tt, conjunction A, negation -i, state predicates and 
temporal operators. We use a state predicate variant where the location of a 
machine is stated: k = s meaning that machine i should be in state s, similar to 
the guards. 

The temporal operators are the next operator X(0) , the future operator F(0) , 
the globally operator G( 0 ) and the until operator (^lU 02 )- Each of these ope- 
rators must be directly preceded with a path quantifier to state whether the 
formula should hold for all possible execution paths of the system (A) or only 
for at least one execution path (E). 

The solution to a CTL formula f is the set of states |^] that satisfy the 
formula A state/event system S is said to satisfy the property S \= f if the 
initial state is in the solution to 0, G |0]. 

To describe the exact semantics of the operators we use an additional function 
|EX] that operates on a set of states P. This function returns all the states that 
may reach at least one of the states in P in exactly one step and is defined as 

[EXl F = {s G S' I 3e, o, sb s ^ sW G P}. The operators are then defined, 

for a State/Event system with the transition relation s sf as: 

|tt] = S = si = {s' G S' I s' = s} 

[A] = {(pi A (f)2j = [<Ail n l<p2j 

|EX = [EXl M [EG </.] =12UC S. m n [EXIC 

[E(01 u 02)1 = I^UCS. [02l U (|0il n [EXIC) 

Here we use i/x.f[x) and jxx.f[x) as the maximal and minimal fixed points of a 
monotone function / on a complete lattice, as given by Tarski’s fixed point theo- 
rem [16]. The rest of the operators can be defined using the above operators [8]. 



4 Bounded CTL Solutions 

In this section we introduce the hounded CTL solution. A bounded CTL solution 
consists of two sets of states, namely T|0]/ and Zi|^]/ which are lower- and upper 
approximations to the solution of the formula. The idea is to test for inclusion 
of the initial state in and exclusion from In the first case we know 

that the formula holds and in the second that it does not. 

To describe bounded CTL solutions we need to formalize the concept of 
dependency between machines in a state/event system. We choose the notion 
that one machine Mi depends on another machine Mj if there exists at least one 
guard in Mi that has a syntactic reference to a state in Mj. These dependencies 
form a directed graph, which we call the dependency graph. In this graph each 
vertex represent a machine and an edge from a vertex i to a vertex j represents 
a dependency in machine Mi on a state in machine Mj. Note that we can ignore 
any dependencies introduced by the global synchronization of the input events. 

A formula f depends on a machine Mi if f contains a sub- formula of the 
form li = s, and the sort of f is all the machines f depends on. The dependency 
closure of a machine Mi is all the machines that are reachable from Mi in the 
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dependency graph, including Mi. This is also sometimes refered to as the cone 



of inflence . The dependency closure of a formula 
dependency closures of the machines in the sort of 


is the union of all the 


Assume (for the time being) that we have an efficient way to calculate a 
bounded solution to a CTL formula using only the machines in an index set 
1. The result should be two sets of states T|^]/ and ZL|^]/ with the following 


properties: 




c mcumj. 


(1) 


LUU c ifiiC/2. 


(2) 




(3) 


= 10] = ZL|0|i if I is dependency closed. 


(4) 



Both T[0]/ and are only defined for sets I that include the sort of <j>. 

Property (1) means that T|^]/ is a lower approximation of |0] and is an 

upper approximation of |^]. Property (2) and (3) mean that the approximations 
converge monotonically towards the correct solution of (j) and property (4) states 
that we get the correct solution to ^ when I contains all the machines found in 
the dependency closure of (j). 

In section 5 we will show an algorithm that efficiently computes the bounded 
solution to any CTL formulae. With this, it is possible to make a serious impro- 
vement to the usual algorithm for finding CTL solutions. The algorithm utilizes 
the fact that we may be able to prove or disprove the property (j) using only a 
(hopefully) small subset of all the machines in the system. 

Our algorithm for verifying a CTL property is as follows 

Algorithm CTL verifier 

Input: A CTL formula and a state /event system S and it’s depen- 
dency graph G 

Output: true if G |0] and false otherwise 

1. I = sort(0); result = unknown 

2. repeat 

3. calculate T|^]/ and 

4. if G T|0]/ then result = true 

5. ii ^ then result = false 

6. I = I U extend{fi G) 

7. until result unknown 

First we set I to be the sort of and use this to calculate and ZL|^]i. If 

5*^ G T|0]/, then we know, from property (2), that holds for the system, and if 

5*^ ^ then we know, from property (3), that does not hold. If neither is 

the case then we add more machines to i and try again. This continues until (j) 
is either proved or disproved. The algorithm is guaranteed to stop with either a 
false or a true result when I is the dependency closure of 0, in which case we have 
from property (4), and thus either G T|^]/ or ^ 77|^]/. 
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The function extend selects a new set of machines to be included in L We 
have chosen to include new machines in a breadth-first manner, so that extend. 
returns all machines reachable in G from 1 in one step. 

5 Bounded CTL Calculation 

In section 4 we showed how to verify a CTL formula by using only a minimal 
set of machines I from a state/event system and using an efficient algorithm for 
the calculation of lower and upper approximations of |^]. In this section we will 
show one such algorithm, and for this we need some more definitions. Relating 
to an index set I of machines, two states E S are 1 -equivalent^ written as 
s =/ if for all i G i, Si = A set of states F is I- sorted if the following 
predicate is true 

/-sorted(P) s' e S, {s e P A s =j s') ^ s' E P, 

This means that if a state s is in P then all other states s' ^ which are /-equivalent 
to s, must also be in the set P. This is equivalent to say that a set P is i -sorted 
if it is independent of the machines in the complement I = ,n}\i. 

Consider as an example two machines with the sets of states So = {po:Pi}y 
Si = /2} e^nd a sort / = {0}. Now the two pairs of states (po^Qi) 

and (^ 07 ^ 2 ) £^re i-equivalent because their first states match. The set P = 
{(/o, /o), (to, ^1), (/o, ^2)} is i-sorted because it is independent of the states in 

The bounded calculation of the constants tt and li = s^ negation and con- 
junction is straight forward as shown in figure 2. The results are clearly i-sorted 
and satisfies the properties in section 4, if the sub- express ions 0, and (j )2 does 
so. 



Next State Operator: To show how to find T|EX (pjj and ii|EX (pjj we 
introduce two new operators: |E\/X]/ and [E^X]/. The lower approximation 
|E\/X]/ P is a conservative approximation to |EX] that only includes states that 
are guaranteed to reach P in one step, regardless of the states of the machines 
in i. The upper approximation [E^XJi P is an optimistic approximation that 
includes all states that just might reach P. These two operators are defined as: 

|EvX]j P = {seS\Ws' eS. s=i s' ^ s' E |EX]j P} 
lE^X]/ P = {sE S \ 3s' E S. s=i s' A s' E |EX]j P} 

where the results of both operators are /-sorted when |EX]/ P is /-sorted, as a 
result of the extra quantifiers. The calculation of |EX]/ P can be done efficiently 
when P is I-sorted^ using a partitioned transition relation [4]. The definition of 
[EXli IS 

[EX]/ P = {s e s \ 3e,o, s', s^s' As' e P}. 
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This seems to depend on all n machines in <S, but as a result of P being /-sorted , 
it can be reduced to 

[EX]^ F={seS\3e,o. 3sL 3^. s ^ A(s'i,... ,s^) e F} 

where 3s j. s Sj means /\^ -^-^i s^. This clearly depends on only 

the transition relations for the machines in L 

Now we can define T|EX (pjj and Z/|EX (pjj as 

£[EX = [EvX]/ 

Zi[EX <>], = [EbXIj Umr. 

Both T|EX (pli and Z/|EX 0]/ are clearly i -sorted because both |E\/X]/ and 
[E 3 X 1 , are so, and if (j) satisfies the properties in section 4 then so does T|EX 
and//|EX 



Globally and Until operators: The semantics for EG 0 is defined in the 
same manner as EX with an added fixed point calculation: 

T|EG </>], = i/U TM, n lEyXjjU 
Z/|EG = lyIP Umj n |E3X]/U 

The result is also i-sorted and satisfies the properties in section 4 if ^ does. For 
E(0i U 02 ) we take 

T|E(0i U 02)1/ = T|02]/ U (T|0i1, n |EvX], U) 

Z/|E(01 u 02)]i = /iU z/|02]i u (z/|0il, n IE3X], u). 

6 Reusing Bounded CTL Calculations 

One problem with the operators T|0]/ and Z/|0]/ from section 5, when used 
in the bounded CTL verifier, is that all previously found states have to be 
rediscovered whenever a new set of machines I is introduced. In this section 
we will show how to avoid this problem for the EG operator and sketch how to 
do it for the E(0iU 02) operator. The final algorithm is shown in figure 2. 

First we show how the calculation of Z/|EG 0]^^ can be improved by reusing 
the previous calculation of Z/|EG 0]/^ when p C p. From the definition of 
Z/|EG 0]/ we get the following: 

Z/|EG 01/2 C Z/|EG 0]/, C and 

Zi[EG CPU Q mih c uuu 

and from this we know that 

UIBG cPUQUlEG 

We also know, from Tarski’s fixed point theorem, that nfGx^i/f = f^- 
which means the maximum fixed point calculation of / can be started from any 
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= {S, S) 


Blh = 


= let L = {s' 6 S' 1 Si = s} 
U={s' eS\s'i^s} 
in {L,U) 




= let {L,U)=Bmj^ 
in {S\U , S\L) 


Bl(f>r A <^ 2 ]lfc 


= let {Lr,Ur)=B[<t>rh, 
{L2,U2)=B[<f>2jl, 
in {Lr 0^2, t/i n U 2 ) 


«[EX 


= let {L,U)=Bmi, 

in ([EvX]j, L , [EaXlj, U) 


mG 


= let {Lr,Ur)=Bmi, 



, t/ 2 ) = B[EG 

u = vV. {Ur n U2) n [EaXli.y (a) 

L= ;^V/. (LirU/)n[EvX]j,V/ (b) 

in (L , U) 

B[E (</>iU <h)lr, = let (Li, Ur) = 

(L2,[/2)=^3Mi, 

(L3,__)=B[E (</>iU </>2)li,_i 

L = /iE. (L2 U L3) U (Li n [EaXli.y (c) 
t/ = /iV/. (t/2UL)U(t/in[E3XIj,V/) (d) 
in (L , [/) 



Eig. 2. Full description of how the lower and upper approximations (£[^]| 7 , Zi[^]| j) = 
i3[^]|/ are calculated for a state/event system S. The sorts are for the current sort 
and Ik -1 for the previous sort. Initially we have i3[[^]/Q = (0, S) and Jo is the sort of 
the expression. We use L for a lower approximation and U for an upper approximation. 
The lines (a)-(d) show where we reuse previously found states. 



X as long as x includes the maximal fixed point of /. Here we use P{x) as the 
i’th application of / on itself. From this it is clear that the fixed point calculation 
of Zi|EG can be started from the intersection of the two sets Zi[EG 
and ZY|^]/ 2 . Normally this fixed point calculation would have been started from 
Zi|0]/2, but in this way we reuse the calculation of Zi|EG 0]/^ to speed up the 
calculation of Zi|EG 

The same idea can be used for the lower approximation £|EG ^]/ 2 , where 
the fixed point iteration can be started from the intersection of and 

Zi|EG 0]/2, so that we reuse the calculation of the upper approximation. The 
algorithm in figure 2 utilizes this in line (a) for the upper approximation and in 
line (b) for the lower approximation. 

Exactly the same can be done for £|E(^iU ^ 2 )]/ andZi|E(^iU ^2)1/7 except 
that the previous lower approximations should be used to restart the calculation, 
as shown in line c and d in figure 2. 
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System 


Machines 


Local states 


Declared 


Reachable 


INTERVM 


6 


182 


IV 


15144 


VCR 


7 


46 


10® 


1279 


DKVM 


9 


55 


10® 


377568 


HI-FI 


9 


59 


10" 


1416384 


FLOW 


10 


232 


10® 


17040 


MOTOR 


12 


41 


10® 


34560 


AVS 


12 


66 


10" 


1438416 


VIDEO 


13 


74 


10® 


1219440 


atc 


14 


194 


10"® 


6399552 


OIL 


24 


96 


10"® 


237230192 


TRAIN 1 


373 


931 


10"®® 


- 


train2 


1421 


3204 


10""® 


- 



Table 1. The state/event systems used in the experiments. The last two columns 
show the size of the declared and reachable state space. The size of the declared state 
space is the product of the number of local states of each machine. The reachable state 
space is only known for those systems where a forward iteration of the state space can 
complete. 



7 Examples 

The technique presented here has been tested on ten industrial state/event sy- 
stems and two systems constructed by students in a course on embedded systems. 
The examples are all constructed using vis ualS TATE™ [13] and cover a large 
range of different applications. The examples are Hi-Fb AVS, ATC, FLOW, MO- 
TOR, INTERVM, DKVM, OIL, TRAIN 1 and train2 which are industrial examples 
and VCR and video which are constructed by students. In table 1 we have listed 
some characteristics for these examples. 

The experiments were carried out on a pentium 166MHz PC with 32Mb of 
memory, running Linux. For the ROBDD encoding we used BuDDy [10], a locally 
produced ROBDD package which is comparable in efficiency to other state of 
the art ROBDD packages, such as CUDD [15]. 

For each example we tested three different sets of CTL formulae. One set 
of formulae for detecting non-determinism in the system, one set for detecting 
local deadlocks and one for finding homestates. 

Non-determinism occurs when two transitions leading out of a state depends 
on the same event and has guards that are enabled at the same time in a reach- 
able global state. That is, the intersection of the two guards g = gi A Q2 should 
be non-empty and reachable. Every combination of guards were then checked 
for reachability using the formula EF g. 

Locally deadlocked states are local states from which there can never be 
enabled any outgoing transition, no matter how the rest of the system behaves. 
So for each local state s in the system we check for absence of local deadlocks 
using the formula AG (s ^ EF -is) 
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Homestates are states that can be reached from any point in the reachable 
state space. So for each local state s of the system we get the formula AG (EF s) . 

We have unfortunately only access to the examples in an anonymous form, 
so we have no way of generating more specialized properties. 

In table 2 we have listed the time it takes to complete checking a whole set of 
CTL formulae using the standard backwards traversal with either all machines 
in the system or only the machines in the dependency closure, and the time used 
with stepwise traversal. For the largest system it is only the stepwise traversal 
that succeeds and with the exception of one system (atc) the stepwise traversal 
is also faster or comparable in speed to the standard backwards traversal. 

We have also shown the number of tests that can be verified using fewer 
machines than in the dependency closure, how much of the dependency closure 
there was needed to do it, how many tests that had to include the full depen- 
dency closure and the average size of that dependency closure. From this we 
can see that in the train2 example we can verify most of the formulae using 
only a small fraction (3 — 15%) of the dependency closure and when the full 
dependency closure has to be included then the average size of it is only as little 
as 1 ( although we know that some tests includes more than 200 machines in the 
dependency closure). This indicates that train2 is a loosely coupled system i.e. 
a system with few dependencies among the state machines. 

We also see that ATC and OIL are more strongly coupled, as the average 
dependency closure is larger than for the other examples. This property is also 
mirrored in the time needed to verify the two examples. 

8 Conclusion 

We have extended the successful model checking technique presented in [11] 
with the ability to do full CTL model checking and not only reachability and 
deadlock detection. We have also added the calculation of both upper and lower 
approximations to the result and in this way making it possible to stop earlier 
in the verification process with either a negative or a positive answer. 

Test examples have shown the stepwise traversal of the state space to be 
more efficient, than the normal backwards traversal, in terms of both time and 
space for a range of industrial examples. We have also shown that the stepwise 
technique may succeed in cases where the standard techniques fails. 

The examples also indicates that the stepwise traversal works best on loosely 
coupled systems, that is; systems with few dependencies among the involved 
state machines. 
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Test 
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DC. 


Step 


Red. 


Part. 


DC. 


Full DC. 
















Ok 


Err 


Ok Err Size 


INTERVM 


D 


182 


6.7 


6.2 


6.8 




150 41% 


0 0% 


32 


0 


4.5 


(6) 


H 


182 


50.3 


47.3 


40.6 


+ 19% 


96 57% 


0 0% 


86 


0 


4.8 


VCR 


C 


1 


0.3 


0.2 


0.2 




0 0% 


0 0% 


1 


0 


6.0 


(7) 


D 


46 


0.6 


0.4 


0.8 




2 40% 


0 0% 


44 


0 


5.4 


H 


46 


2.2 


1.3 


1.5 




2 40% 


0 0% 


44 


0 


5.4 


DKVM 


D 


55 


0.5 


0.4 


0.4 




46 20% 


0 0% 


9 


0 


1.0 


(9) 


H 


55 


8.7 


8.7 


6.7 




27 45% 


1 11% 


27 


0 


1.7 


HI-FI 


D 


59 


0.8 


0.7 


0.5 




56 18% 


0 0% 


3 


0 


3.0 


(9) 


H 


59 


3.5 


3.1 


2.3 




52 56% 


0 0% 


7 


0 


5.3 


FLOW 


C 


2 


0.6 


0.6 


0.6 




2 75% 


0 0% 


0 


0 


- 


(10) 


D 


232 


1.4 


1.1 


1.1 




224 49% 


0 0% 


8 


0 


1.0 




H 


232 


3.6 


2.4 


2.1 




223 49% 


0 0% 


9 


0 


1.2 


MOTOR 


D 


41 


0.9 


0.9 


0.6 




32 21% 


0 0% 


9 


0 


1.0 


(12) 


H 


41 


1.2 


1.2 


0.7 




32 24% 


0 0% 


9 


0 


1.0 


AVS 


C 


5 


1.2 


1.2 


1.1 




4 27% 


0 0% 


1 


0 


3.0 


(12) 


D 


66 


2.0 


1.7 


1.5 




57 34% 


0 0% 


8 


1 


1.3 




H 


66 


6.0 


5.2 


4.0 




42 64% 


3 76% 


20 


1 


3.2 


VIDEO 


D 


74 


0.9 


0.7 


0.6 




70 30% 


0 0% 


4 


0 


2.0 


(13) 


H 


74 


2.6 


1.3 


1.3 




57 54% 


0 0% 


17 


0 


4.3 


ATC 


C 


122 


153.5 


140.4 


138.6 


+ 10% 
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0 0% 
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0 


12.0 


(14) 


D 


194 


119.0 


110.2 


135.3 


-14% 


3 8% 


63 75% 


6 


122 


11.6 


H 


194 


495.3 


461.2 


443.7 


+ 10% 


3 8% 


63 75% 


6 


122 


11.6 


OIL 


C 


114 


242.8 


177.4 


163.8 


+33% 


2 25% 


29 25% 


83 


0 


12.0 


(24) 


D 


96 


15.0 


9.6 


7.3 


+51% 


58 19% 


6 17% 


26 


6 


6.6 




H 


96 


35.5 


23.8 


15.5 


+56% 


22 22% 


33 9% 


31 


10 


7.8 


TRAIN 1 


C 


99 


76.1 


3.7 


3.6 


+95% 


82 57% 


0 0% 


17 


0 


4.5 


(373) 


D 


931 


449.7 


5.0 


5.3 


+99% 


388 41% 


9 58% 


502 


32 


1.0 




H 


931 


478.8 


4.9 


4.9 


+99% 


354 41% 


42 50% 


500 


35 


1.0 


train2 


C 


1245 


- 


- 


265.4 


+100% 


912 8% 


30 6% 


303 


0 


1.1 


(1421) 


D 


3204 


- 


- 


199.0 


+100% 


1569 3% 


16 8% 


1583 


36 


1.0 




H 


3204 


- 


- 


197.1 


+100% 


1521 3% 


58 15% 


1585 


40 


1.0 



Table 2. Test examples for runtime and dependency analysis. All times are in seconds. 
The tests are C-Conflicts, D-Deadlock and H-Homestates. The Num column shows the 
number of tests, the Full column is the time used with all machines included from 
the beginning (and still using a partitioned transition relation), the DC. column is 
the time used with only the dependency closure included from the beginning and the 
Step column is the time used with stepwise traversal. A dash means timeout after 
one hour or spaceout around 20Mb, all other tests were done with less than 250k 
ROBDD nodes in memory at one time. The Red column is the reduction in runtime 
= [Full — Step) / Full. Some systems have no conflicts and we have left out the data 
for these. The Part. DC. column shows the number of tests that could be verified using 
fewer state machines than in the full dependency closure, whether the result was true 
(Ok) or false (Err) and how much of the dependency closure was included. The Full 
DC. column shows the number of tests that needed the full dependency closure to be 
proven true (Ok) or false (Err) and the average size of the dependency closure (Size). 
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Abstract. This paper presents optimizations for verifying systems with complex 
time-invariant constraints. These constraints arise naturally from modeling phy- 
sical systems, e.g., in establishing the relationship between different components 
in a system. To verify constraint-rich systems, we propose two new optimizati- 
ons. The first optimization is a simple, yet powerful, extension of the conjunctive- 
partitioning algorithm. The second is a collection of BDD-based macro-extraction 
and macro-expansion algorithms to remove state variables. We show that these two 
optimizations are essential in verifying constraint-rich problems; in particular, this 
work has enabled the verification of fault diagnosis models of the Nomad robot 
(an Antarctic meteorite explorer) and of the NASA Deep Space One spacecraft. 



1 Introduction 

This paper presents techniques for using symbolic model checking to automatically 
verify a class of real-world appheations that have many time-invariant constraints. An 
example of constraint-rich systems is the symbohe models developed by NASA for on- 
line fault diagnosis [15]. These models describe the operation of components in complex 
electro-mechanical systems, such as autonomous spacecraft or robot explorers. The mo- 
dels consist of interconnected components (e.g., thrusters, sensors, motors, computers, 
and valves) and describe how the mode of each component changes over time. Based on 
these models, the Livingstone diagnostic engine [15] monitors sensor values and detects, 
diagnoses, and tries to recover from inconsistencies between the observed sensor values 
and the predicted modes of the components. The relationships between the modes and 
sensor values are encoded using symbolic constraints. Constraints between state varia- 
bles are also used to encode interconnections between components. We have developed 
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an automatic translator from such fault models to SMV (Symbolic Model Verifier) [10], 
where mode transitions are encoded as transition relations and state-variable constraints 
are translated into sets of time-invariant constraints. 

To verify constraint-rich systems, we introduce two new optimizations. The first 
optimization is a simple extension of the conjunctive-partitioning algorithm. The other is 
a collection of BDD-based macro-extraction and macro-expansion algorithms to remove 
redundant state variables. We show that these two optimizations are essential in verifying 
constraint-rich problems. In particular, these optimizations have enabled the verification 
of fault diagnosis models for the Nomad robot (an Antarctic meteorite explorer) [1] and 
the NASA Deep Space One (DSl) spacecraft [2]. These models can be quite large, with 
up to 1200 state bits. 

The rest of this paper is organized as follows. We first briefly describe symbolic 
model checking and how time-invariant constraints arise naturally from modeling (Sec- 
tion 2). We then present our new optimizations: an extension to conjunctive partitioning 
(Section 3), and BDD-based algorithms for eliminating redundant state variables (Sec- 
tion 4). We then show the results of a performance evaluation on the effects of each 
optimization (Section 5). Finally, we present a comparison to prior work (Section 6) and 
some concluding remarks (Section 7). 

2 Background 

Symbolic model checking [5,6,10] is a fully automatic verification paradigm that checks 
temporal properties (e.g., safety, liveness, fairness, etc.) of finite state systems by sym- 
bolic state traversal. The core enabling technology for symbolic model checking is the 
use of the Binary Decision Diagram (BDD) representation [4] for state sets and state 
transitions. BDDs represent Boolean formulas canonically as directed acyclic graphs 
such that equivalent sub-formulas are uniquely represented as a single subgraph. This 
uniqueness property makes BDDs compact and enables dynamic programming to be 
used for computing Boolean operations symbolically. 

To use BDDs in model checking, we need to map sets of states, state transitions, and 
state traversal to the Boolean domain. In this section, we briefly describe this mapping 
and motivate how time-invariant constraints arise. We then finish with definitions of 
some additional terminology to be used in the rest of the paper. 

2.1 Representing State Sets and Transitions 

In the symbolic model checking of finite state systems, a state typically describes the 
values of many components (e.g., latches in digital circuits) and each component is 
represented by a state variable. Let V = {v \, be the set of state variables in 
a system, then a state can be described by assigning values to all the variables in V. 
This valuation can in term be written as a Boolean formula that is true exactly for the 
valuation as == ^0’ where q is the value assigned to the variable v^, and 

the “==” represents the equality operator in a predicate (similar to the C programming 
language). A set of states can be represented as a disjunction of the Boolean formulas 
that represent the states. We denote the BDD representation for a set of states S' by 5( L ) . 
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In addition to the set of states, we also need to map the system’s state transitions 
to the Boolean domain. We extend the above concept of representing a set of states to 
representing a set of ordered-pairs of states. To represent a pair of states, we need two 
sets of state variables: V the set of present- state variables for the first tuple and the 
set of next- state variables for the second tuple. Each variable v 'mV has a corresponding 
next-state variable V 'm . A valuation of variables in V and can be viewed as a 
state transition from one state to another. A transition relation can then be represented 
as a set of these valuations. We denote the BDD representation of a transition relation 
TeisT{V,V^). 

In modeling finite state systems, the overall state transitions are generally specified 
by defining the valid transitions for each state variable. To support non-deterministic 
transitions of a state variable, the expression that defines the transitions evaluates to a 
set, and the next-state value of the state variable is non-deterministically chosen from the 
elements in the set. Hereafter, we refer to an expression that evaluates to a set either as a 
set expression or as a non-deterministic expression depending on the context, and we use 
the bold font type, as in f, to represent such expression. Let be the set expression re- 
presenting state transitions of the state variable Vi. Then the BDD representation for t?^’s 
transition relation Ti can be defined as Ti{V, V^) := G For synchronous sy- 
stems, the BDD for the overall state transition relation 7 Ts 7’ (L, V^) := V), 

Detailed descriptions on this formulation, including mapping of asynchronous systems, 
can be found in [5]. 

2.2 Time-Invariant Constraints and Their Common Usages 

In symbolic model checking, time-invariant constraints specify the conditions that must 
always hold. More formally, let Ci, . . . , Ci be the time-invariant constraints and let 
C :=CiAC 2 A...AQ. Then, in symbolic state traversal, we consider only states where 
C is true. We refer to C as the constrained space. 

To motivate how time-invariant constraints arise naturally in modeling complex 
systems, we describe three common usages. One common usage is to make the same non- 
deterministic choice across multiple expressions in transition relations. For example, in a 
master-slave model, the master can non-deterministically choose which set of idle slaves 
to assign the pending jobs, and the slaves’ next-state values will depend on the choice 
made. To model this, let f be a non-deterministic expression representing how the master 
makes its choice. If f is used multiple times, then each use makes a non-deterministic 
choice independently of other uses. Thus, to ensure that the same non-deterministic 
choice is seen by the slaves, a new state variable u is introduced to record the choice 
made, and u is then used to define the slaves’ transition relations. This recording process 
is expressed as the time-invariant constraint w G f . 

Another common usage is for establishing the interface between different compo- 
nents in a system. For example, suppose two components are connected with a pipe of a 
fixed capacity. Then, the input of one component is the minimum of the pipe’s capacity 
and the output of the other component. This relationship is described as a time-invariant 
constraint between the input and the output of these two components. 

Third common usage is specific uses of generic parts. For example, a bi-directional 
fuel pipe may be used to connect two components. If we want to make sure the fuel 
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flows only one way, we need to constrain the valves in the fuel pipe. These constraints 
are specified as time-invariant constraints. In general, specific uses of generic parts arise 
naturally in both the software and the hardware domain as we often use generic building 
blocks in constructing a complex system. 

In the examples above, the use of time-invariant constraints is not always necessary 
because some these constraints can be directly expressed as a part of the transition relation 
and the associated state variables can be removed. However, these constraints are used 
to facilitate the description of the system or to reflect the way complex systems are built. 
Without these constraints, multiple expressions will need to be combined into possibly 
a very complicated expression. Performing this transformation manually can be error- 
prone. Thus it is up to the verification tool to automatically perform these transformations 
and remove unnecessary state variables. Our optimizations for constraint-rich models 
is to automatically eliminate redundant state variables (Section 4) and partition the 
remaining constraints (Section 3). 

2.3 Symbolic State Traversal 

To reason about temporal properties, the pre-image and the image of the transition 
relation are used for symbolic state traversal, and time-invariant constraints are used to 
restrict the valid state space. Based on the BDD representations of a state set S and the 
transition relation 7 we can compute the pre-image and the image of S, while restricting 
the computations to the constrained space C, as follows: 

pre-image{S){V) := C{V) A3V\[T{V,V^) A {S{V^) AC (1) 
image{S){V') := C{V') A3V.[T{V,V') A {S{V) AC{V))] (2) 

One limitation of the BDD representation is that the monolithic BDD for the transi- 
tion relation T is often too large to build. A solution to this problem is the conjunctive 
partitioning [5] of the transition relation. In conjunctive partitioning, the transition rela- 
tion is represented as a conjunction Pi A P 2 A . . . A Pa: with each conjunct Pi represented 
by a BDD. Then, the pre-image can be computed by conjuncting with one Pi at a time, 
and by using early quantification to quantify out variables as soon as possible. The 
early-quantification optimization is based on the property that sub-formulas can be mo- 
ved out of the scope of an existential quantification if they do not depend on any of the 
variables being quantified. Formally, let F/, a subset of be the set of variables that 
do not appear in any of the subsequent Pj’s, where 1 < i < A: and i < j < k. Then the 
pre-image can be computed as 

Pi := A AC{V^))] (3) 

P2 :=3Vl[F2{V,V')Ap,] 

Pk ■.= 3Vi.[Pk{V,V) Apk-i] 
pre-image{S){V) := C{V) Apk 

The determination and ordering of partitions (the P^’s in above) can have significant 
performance impact. Commonly used heuristics [7,11] treat the state variables ’ transition 
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relations QVs) as conjuncts. The ordering step then greedily schedules the partitions to 
quantify out more variables as soon as possible, while introducing fewer new variables. 
Finally, the ordered partitions are tentatively merged with their predecessors to reduce 
the number of intermediate results. Each merged result is kept only if the resulting graph 
size is less than a pre- determined limit. 

The conjunctive partitioning for the image computation is performed similarly with 
present-state variables in V being the quantifying variables instead of next-state variables 
in V'. However, since the quantifying variables are different between the image and the 
pre-image computation, the resulting conjuncts for image computation is typically very 
different from those for pre-image computation. 



2.4 Additional Terminology 

We define the ITE operator (if-then-else) as follows: given arbitrary expressions / and 
g where / and g may both be set expressions, and Boolean expression p, then 



ITE{p,f,g){X) := | 



g{X) otherwise; 



where X is the set of variables used in expressions p, /, and g. We define a care-space 
optimization as any algorithm care -opt that has following properties: given an arbitrary 
function / where / may be a set expression, and a Boolean formula c, then 



care-opt{f\c) := ITE{c, f\ d), 



where d is defined by the particular algorithm used. The usual interpretation of this is 
that we only care about the values of / when c is true. We will refer to c as the care space 
and ->c as the don’t-care space. The goal of care-space optimizations is to heuristically 
minimize the representation for / by choosing a suitable d in the don’t-care space. 
Descriptions and a study of some care-space optimizations, including the commonly 
used restrict algorithm [6], can be found in [13]. 



3 Extended Conjunctive Partitioning 

The first optimization is the application of the conjunctive-partitioning algorithm on the 
time-invariant constraints. This extension is derived based on two observations. First, as 
with the transition relations, the BDD representation for time-invariant constraints can 
be too large to be represented as a monolithic graph. Thus, it is crucial to represent the 
constraints as a set of conjuncts rather than a monolithic graph. 

Second, in constraint-rich models, many quantifying variables (variables being quan- 
tified) do not appear in the transition relation. There are two common causes for this. 
First, when time-invariant constraints are used to make the same non-deterministic choi- 
ces, new variables are introduced to record these choices (described as the first example 
in Section 2.2). In the transition relation, these new variables are used only in their 
present-state form. Thus, their corresponding next-state variables do not appear in the 
transition relation, and for the pre-image computation, these next-state variables are parts 
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of the quantifying variables. The other cause is that many state variables are used only 
to establish time-invariant constraints. Thus, both the present- and the next-state version 
of these variables do not appear in the transition relations. 

Based on this observation, we can improve the early-quantification optimization by 
pulling out the quantifying variables (VgO that do not appear in any of the transition 
relations. Then, these quantifying variables ( VgO can be used for early quantification 
in conjunctive partitioning of the constrained space (C) where the time-invariant con- 
straints hold. Formally, let , Q 2 , • • • , Qm be the partitions produced by the conjunctive 
partitioning of the constrained space C, whereC = Q 1 2 For the pre-image 
computation. Equation 3 is replaced by 

qi ■.= 3W[.[Q^{V')^S{V')] 
q 2 :=3W'.[Q2{V) Aqi] 

qm ■= 3W^.[Qm{V') A qm-i] 

Pi ■.= 3Vl.[Fi{V,V) Aqm\ 

where W[, a subset of Vq, is the set of variables that do not appear in any of the subsequent 
Q/s, where 1 < i < m and i < j < m. Similarly, this extension also applies to the 
image computation. 

4 Elimination of Redundant State Variables 

Our second optimization for constraint-rich models is targeted at reducing the state 
space by removing unnecessary state variables. This optimization is a set of BDD-based 
algorithms that compute an equivalent expression for each variable used in the time- 
invariant constraints {macro extraction) and then globally replace a suitable subset of 
variables with their equivalent expressions {macro expansion) to reduce the total number 
of variables. 

The use of macros is traditionally supported by language constructs (e.g., DEFINE in 
the SMV language [10]) and by simple syntactic analyses such as detecting deterministic 
assignments (e.g., a == f where a is a state variable and / is an expression) in the 
specifications. However, in constraint-rich models, the constraints are often specified 
in a more complex manner such as conditional dependencies on other state variables 
(e.g., p ^ (a == /) as conditional assignment of expression / to variable a when 
p is true). To identify the set of valid macros in such models, we need to combine the 
effects of multiple constraints. For these models, one drawback of syntactic analysis 
is that, for each type of expression, syntactic analysis will need to add a template to 
pattern match these expressions. Another more severe drawback is that it is difficult 
for syntactic analysis to estimate the actual cost of instantiating a macro. Estimating 
this cost is important because reducing the number of variables by macro expansion 
can sometimes result in significant performance degradation caused by large increases 
in other BDD sizes. These two drawbacks make the syntactic approach unsuitable for 
models with complex time-invariant constraints. 
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Our approach uses BDD-based algorithms to analyze time-invariant constraints and 
to derive the set of possible macros. The core algorithm is a new assignment-extraction 
algorithm that extracts assignments from arbitrary Boolean expressions (Section 4.1). 
For each variable, by extracting its assignment form, we can determine the variable’s 
corresponding equivalent expression, and when appropriate, globally replace the variable 
with its equivalent expression (Section 4.2). The strength of this algorithm is that by using 
BDDs, the cost of macro expansion can be better characterized since the actual model 
checking computation is performed using BDDs. 

Note that there have been a number of research efforts on BDD-based redundant 
variable removal. To better compare our approach to these previous research efforts, we 
postpone the discussion of this prior work until Section 6, after describing our algorithms 
and the performance evaluation. 

4.1 BDD-Based Assignment Extraction 

The assignment-extraction problem can be stated as follows: given an arbitrary Boolean 
formula / and a variable v (where v can be non-Boolean), find g and h such that 

- f = {v eg) Ah, 

- g does not depend on v, and 

- is a Boolean formula and does not depend on v. 

The expression {v e g) represents a non-deterministic assignment to variable v. In the 
case that g always returns a singleton set, the assignment {v e g) is deterministic. A 
solution to this assignment-extraction problem is as follows: 

h = 3vJ 

t= U ITE{f\^^k,{k},$) (4) 

keK^ 

g = re strict {t, h) 

where is the set of all possible values of variable v, and restrict [6] is a care-space 
optimization algorithm that tries to reduce the BDD graph size (of t) by collapsing the 
don’t-care space The BDD algorithm for the UajgIC operator is similar to the 
BDD algorithm for the existential quantification with the V operator replaced by the U 
operator for variable quantification. A correctness proof of this algorithm can be found 
in the technical-report version of this paper [17]. 

4.2 Macro Extraction and Expansion 

In this section, we describe the elimination of state variables based on macro extraction 
and macro expansion. The first step is to extract macros with the algorithm shown 
in Figure 1. This algorithm extracts macros from the constrained space (C), which is 
represented as a set of conjuncts. It first uses the assignment-extraction algorithm to 
extract assignment expressions (line 5). It then identifies the deterministic assignments 
as candidate macros (line 6). For each candidate, the algorithm tests to see if applying 
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the macro may be beneficial (line 7). This test is based on the heuristic that if the BDD 
graph size of a macro is not too large and its instantiation does not cause excessive 
increase in other BDDs’ graph sizes, then instantiating this macro may be beneficial. If 
the resulting right-hand- side g is not a singleton set, it is kept separately (line 9). These 
g’s are combined later (line 10) to determine if the intersection of these sets would result 
in a macro (lines 11-13). 



extract_macros((7, V) 

/* Extract macros for variables in V from 

the set C of conjuncts representing the constrained space */ 

1 M ^ 0 /* initialize the set of macros found so far */ 

2 for each v E V 

3 ^ 0 /* initialize the set of non-singletons found so far */ 

4 for each f E C such that / depends on v 

5 (g) ^ assignment-extraction (/, v) 1^ f = (v E g) A h ^1 

6 if (g always returns a singleton set) /* macro found */ 

7 if (is-this-result-good(g)) 

8 M^{{v,g)}UM 

9 else N ^ {g} U N 

10 g’^Dgeivg 

11 if (g’ always returns a singleton set) /* macro found */ 

12 if ((is-this-result-good(g’)) 

13 M^{(v,g’)}UM 

14 return M 



Fig. 1. Macro-extraction algorithm. In lines 7 and 12, “is-this-result-good” uses BDD properties 
(such as graph sizes) to determine if the result should be kept. 



After the macros are extracted, the next step is to determine the instantiation order. 
The main purpose of this algorithm (in Figure 2) is to remove circular dependencies. For 
example, if one macro defines variable vi to be {v2 A v^) and a second macro defines 
V2 to be {vi V V4), then instantiating the first macro results in a circular definition in the 
second macro (v2 = {v2 A vs) V V4) and thus invalidates this second macro. Similarly, 
the reverse is also true. To determine the set of macros to remove, the algorithm builds a 
dependence graph (line 1) and breaks circular dependencies based on graph sizes (lines 
2-4). It then determines the ordering of the remaining macros based on the topological 
order (line 4) of the dependence graph. 

Finally, in the topological order, each macro {v, g) is instantiated in the remaining 
macros and in all other expressions (represented by BDDs) in the system, by substituting 
the variable v with its equivalent expression g. 
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order _macros(M ) 

/* Determine the instantiation order of the macros in set M */ 
first build the dependence graph G = [M,E) 

1 E = {(x,y)\x= e M,y= (vy,%y) e Af, depends on 

/* then remove circular dependences */ 

2 while there are cycles in G, 

3 Me ^ set of macros that are in some cycle 

4 remove the macro with largest BDD size in Me 

5 return a topological ordering of the remaining macros in G 



Fig. 2. Macro-ordering algorithm. 



5 Evaluation 

5.1 Experimental Setup 

The benchmark suite used is a collection of 58 SMV models gathered from a wide variety 
of sources, including the 16 models used in a BDD performance study [16]. Out of these 
58 models, 37 models have no time-invariant constraints, and thus our optimizations 
do not get triggered and have no influence on the overall verification time. Out of the 
remaining 21 models, 10 very small models (< 10 seconds) are eliminated. On the 
remaining 1 1 models, our optimizations have made non-negligible performance impact 
on 7 models. In Figure 3, we briefly describe these 7 models. Note that some of these 
models are quite large, with up to 1200 state bits. 



Model 


# of State Bits 


Description 


acs 


497 


the altitude-control module of the NASA DS 1 spacecraft 


dsl-b 


657 


a buggy fault diagnosis model for the NASA DS 1 spacecraft 


dsl 


657 


corrected version of dsl-b 


futurebus 


174 


FutureBus cache coherency protocol 


nomad 


1273 


fault diagnosis model for an Antarctic meteorite explorer 


v-gate 


86 


reactor-system model 


xavier 


100 


fault diagnosis model for the Xavier robot 



Fig. 3. Description of models whose performance results are affected by our optimizations. 



We performed the evaluation using the Symbolic Model Verifier (SMV) model 
checker [10] from Carnegie Mellon University. Conjunctive partitioning was used only 
when it was necessary to complete the verification. In these cases (including acs, nomad, 
dsl-b, and dsl), the size limit for each partition was set to 10,000 BDD nodes. For the 
remaining cases, the transition relations were represented as monolithic BDDs. The 
performance of all the benchmark models were measured in the following four settings: 

Base: no new optimizations except that the constrained space C is represented 
as a conjunction with each conjunct’s BDD graph size limited to 10,000 
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nodes (i.e., optimizations in Section 3 are used without the “early quantifi- 
cation on the constrained space C” optimization). Without this partitioning, 
the BDD representation of the constrained space could not be constructed 
for 4 models. 

Quan: same as the Base case with the addition of the “early quantification on 
the constrained space” optimization (Section 3). 

SynMacro: same as the Quan case with the addition of a syntactic analysis 
that pattern matches deterministic assignment expressions {v == /, where 
t? is a state variable and / is an expression) as macros and expands these 
macros. 

BDDMacro: all the optimizations are turned on; i.e., same as the SynMacro 
case with the addition of BDD-based assignment extraction to extract macros. 

The evaluation was performed on a 200MHz Pentium-Pro with 1 GB of memory 
running Linux. Each run was limited to 6 hours of CPU time and 900 MB of memory. 



5.2 Results 

Figure 4 shows the impact of our optimizations for the 7 models whose results changed 
by more than 10 CPU seconds and 10% from the Base case. For all benchmarks, the 
time spent by our optimizations is very small (< 5 seconds or < 5% of total time) and 
is included in the running time reported. 

The overall impact of our optimizations is shown in the rightmost column of Fi- 
gure 4. These results demonstrate that our optimizations have significantly improved the 
performance for 2 cases (with speedups up to 74) and have enabled the verification of 
4 cases. For the v-gates model, the performance degradation (speedup = 0.7) is in the 
computation of the reachable states from the initial states. Upon further investigation, we 
believe that it is caused by the macro instantiation, which increases the graph size of the 
transition relation from 122-thousand to 476-thousand nodes. This case demonstrates 
that reducing the number of state variables does not always improve performance. 



Model 


Base 

sec 


Quan 

sec 


SynMacro 

sec 


BDDMacro 

sec 


Base / BDDMacro 

speedup 


acs 


m.o. 


32 


76 


7 


enabled 


dsl-b 


m.o. 


321 


138 


54 


enabled 


dsl 


m.o. 


m.o. 


t.o. 


37 


enabled 


futurebus 


1410 


53 


35 


19 


74.2 


nomad 


m.o. 


t.o. 


7801 


633 


enabled 


v-gates 


36 


35 


53 


50 


0.7 


xavier 


16 


5 


1 


2 


8.0 



Fig. 4. Performance impact of each optimization. The m.o. ’s and t.o. ’s are the results that exceeded 
the 900-MB memory limit and the 6-hour time limit, respectively. 
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The remaining columns of Figure 4 show the impact of each optimization. The re- 
sults show that by simply performing early quantification on the constraints (the Quan 
column), we have enabled the verification of acs and dsl-b, and achieved significant 
performance improvement on futurebus (speedup > 20). This is mostly due to the fact 
that a large number of variables can be pulled out of the transition relations and applied 
to conjunctive partitioning and early quantification of the time-invariant constraints (Fi- 
gure 5(a)). With the addition of syntactic analysis for macro extraction (the SynMacro 
column), we are able to verify nomad. Finally, by adding BDD-based macro extraction 
(the BDDMacro column), we are able to verify dsl. The results in Figure 5(b) show 
that BDD-based macro extraction (BDDMacro) can be rather effective in reducing the 
number of variables, especially for the acs, nomad, dsl-b, and dsl models where >150 
additional BDD variables (i.e., > 75 state bits) are removed in comparison to using 
syntactic analysis (SynMacro). 



Model 


Total 
# of BDD 
Variables 


CP Optimization 

# of BDD vars extracted 


Macro Optimization 

# of BDD vars removed 


image 


pre-image 


SynMacro 


BDDMacro 


acs 


994 


439 


449 


82 


352 


dsl-b 


1314 


550 


566 


148 


492 


dsl 


1314 


550 


566 


220 


496 


futurebus 


348 


58 


no 


12 


18 


nomad 


2546 


1121 


1174 


688 


844 


v-gates 


172 


0 


17 


16 


16 


xavier 


200 


69 


86 


64 


116 



(a) (b) 



Fig. 5. Effectiveness of each optimization, (a) Number of quantifying BDD variables that are 
pulled out of the transition relation for early quantification of the time-invariant constraints. These 
results are measured without macro optimizations. With macro optimizations, the corresponding 
results are basically the same as subtracting off the number of state variables removed, (b) The 
number of BDD variables removed by macro expansion. Note: the number of BDD variables is 
twice the number of state variables — one copy for the present state and one copy for the next state. 



6 Related Work 

There have been many research efforts on BDD-based redundant state-variable removal 
in both logic synthesis and verification. These research efforts all use the reachable state 
space (set of states reachable from initial states) to determine functional dependencies 
for Boolean variables (macro extraction). The reachable state space effectively plays the 
same role as a time-invariant constraint, because the verification process only needs to 
check specifications in the reachable state space, 

Berthet et al. propose the first redundant state-variable removal algorithm in [3]. In 
[9], Lin and Newton describe a branch-and-bound algorithm to identify the maximum 





optimizing Symbolic Model Checking for Constraint-Rich Models 



339 



set of redundant state variables. In [12], Sentovich et al. propose new algorithms for 
latch removal and latch replacement in logic synthesis. There is also some work on 
detecting and removing redundant state variables while the reachable state space is 
being computed [8,14]. 

From the algorithmic point of view, our approach is different from prior work in 
two ways. First, in determining the relationship between variables, the algorithms used 
to extract functional dependencies in previous work can be viewed as direct extraction 
of deterministic assignments to Boolean variables. In comparison, our assignment ex- 
traction algorithm is more general because it can also handle non-Boolean variables 
and extract non-deterministic assignments. Second, in performing the redundant state- 
variable removal, the approach used in the previous work would need to combine all 
the constraints first and then extract the macros directly from the combined result. Ho- 
wever, for constraint-rich models, it may not be possible to combine all the constraints 
because the resulting BDD is too large to build. Our approach addresses this issue by 
first applying the assignment extraction algorithm to each constraint separately and then 
combining the results to determine if a macro can be extracted (see Figure 1). 

Another difference is that in previous work, the goal is to remove as many variables as 
possible. However, we have empirically observed that in some cases, removing additional 
variables can result in significant performance degradation in overall verification time 
(slowdown over 4). To address this issue, we use simple heuristics (size of the macro and 
the growth in graph sizes) to choose the set of macros to expand. This simple heuristic 
works well in the test cases we tried. However, in order to fully evaluate the impact of 
different heuristics, we need to gather a larger set of constraint-rich models from a wider 
range of applications. 

7 Conclusions and Future Work 

The two optimizations we proposed are crucial in verifying this new class of constraint- 
rich applications. In particular, they have enabled the verification of real-world applica- 
tions such as the Nomad robot and the NASA Deep Space One spacecraft. 

We have shown that the BDD-based assignment-extraction algorithm is effective 
in identifying macros. We plan to use this algorithm to perform a more precise cone- 
of-influence analysis with the assignment expressions providing the exact dependence 
information between the variables. In general, we plan to study how BDDs can be use 
to further help other compile-time optimizations in symbolic model checking. 
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Abstract. One of the major problems in applying automatic verifica- 
tion tools to industrial-size systems is the excessive amount of memory 
required during the state-space exploration of a model. In the setting 
of real-time, this problem of state-explosion requires extra attention as 
information must be kept not only on the discrete control structure but 
also on the values of continuous clock variables. 

In this paper, we exploit Clock Difference Diagrams, CDD’s, a BDD-like 
data-structure for representing and effectively manipulating certain non- 
convex subsets of the Euclidean space, notably those encountered during 
verification of timed automata. 

A version of the real-time verification tool Uppaal using CDD’s as a 
compact data-structure for storing explored symbolic states has been 
implemented. Our experimental results demonstrate significant space- 
savings: for eight industrial examples, the savings are in average 42% 
with moderate increase in runtime. 

We further report on how the symbolic state-space exploration itself may 
be carried out using CDD’s. 



1 Motivation 

In the last few years a number of verification tools have been developed for 
real-time systems (e.g. [HHW95,DY95,BLLPW96]). The verification engines of 
most tools in this category are based on reachability analysis of timed automata 
following the pioneering work of Alur and Dill [AD94]. A timed automaton is an 
extension of a finite automaton with a finite set of real-valued clock-variables. 
Whereas the initial decidability results are based on a partitioning of the infinite 
state-space of a timed automaton into finitely many equivalence classes (so-called 
regions)^ tools such as Kronos and Uppaal are based on more efficient data 
structures and algorithms for representing and manipulating timing constraints 
over clock variables. The abstract reachability algorithm applied in these tools 
is shown in Figure 1. The algorithm checks whether a timed automaton may 
reach a state satisfying a given state formula 0. It explores the state space 
of the automaton in terms of symbolic states of the form (/,D), where I is a 

^ ^ ^ BRIGS: Basic Research in Computer Science, Centre of the Danish National Rese- 
arch Foundation 
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Fig. 1. An algorithm for symbolic reachability analysis. 



control-node and D is a constraint system over clock variables {Xi,. . . 

More precisely, D consists of a conjunction of simple clock constraints of the 
form Xi op c, —Xi op c and Xi — Xj op c, where c is an integer constant and 
op G {<,<}. The subsets of 'MX which may be described by clock constraint 
systems are called zones. Zones are among those convex polyhedra, where all 
edge-points are integer valued, and where border lines may or may not belong 
to the set (depending on a constraint being strict or not). 

We observe that several operations of the algorithm are critical for efficient 
implementation. In particular the algorithm depends heavily on operations for 
checking set inclusion and emptiness. In the computation of the set Next, opera- 
tions for intersection, forward time projection (future) and projection in one di- 
mension (clock reset) are required. A well-known data-structure for representing 
clock constraint systems is that of Difference Bounded Matrices^ DBM, [Dill87], 
giving for each pair of clocks^ the upper bound on their difference. All operati- 
ons required in the reachability analysis in Figure 1 can be easily implemented 
on DBM’s with satisfactory efficiency. In particular, the various operations may 
benefit from a canonical DBM representation with tightest bounds on all clock 
differences computed by solving a shortest path problem. However, computation 
of this canonical form should be postponed as much as possible, as it is the most 
costly operation on DBM’s with time-complexity O(n^) (n being the number of 
clocks) . 

DBM’s obviously consume space of order O(n^). Alternatively, one may re- 
present a clock constraint system by choosing a minimal subset from the con- 
straints of the DBM in canonical form. This minimal form [LPW95] is preferable 

^ For uniformity, we assume a special clock Xq which is always zero. Thus Xi op c and 
—XiOpc can be rewritten as the differences Xi — Xq op c and Xq — XiOp c. 
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when adding a symbolic state to the main global data-structure Passed, as in 
practice the space-requirement is only linear in the number of clocks. 

Considering once again the reachability algorithm in Figure 1, we see that a 
symbolic state (/, D) from the waiting- list Wait is freed from being explored (the 
inner box) provided some symbolic state {l^ D^) already in Passed ’covers’ it (i.e. 
D C Though clearly a sound rule and provably sufficient for termination of 
the algorithm, exploration of {l^D) may be avoided under less strict conditions. 
In particular, it suffices for {l^D) to be ’covered’ collectively by the symbolic 
states in Passed with location /, i.e.: 

c c !(/,/;') G Passed} (1) 

However, this requires handling of unions of zones, which complicates things 
considerably. Using DBM’s, finite unions of zones - which we will call federa- 
tions in the following - may be represented by a list of all the DBM’s of the 
union. However, the more “non-convex” the zone becomes, the more DBM’s will 
be needed. In particular, this representation makes the inclusion-check of (1) 
computational expensive. 

In this paper, we introduce a more efficient BDD-like data-structure for fede- 
rations, Clock Difference Diagrams^ CDD’s. A ODD is a directed acyclic graph, 
where inner nodes are associated with a given pair of clocks and outgoing arcs 
state bounds on their difference. This data-structure contains DBM’s as a special 
case and offers simple boolean set-operations and easy inclusion- and emptiness- 
checking. Using CDD’s, the PASSED-list may be implemented as a collection of 
symbolic states of the form (/, F), where F is a CDD representing the union of all 
zones for which the location I has been explored^. Thus, the more liberal termi- 
nation condition of (1) may be applied, potentially leading to faster termination 
of the reachability algorithm. As any BDD-like data-structure, CDD’s eliminate 
redundancies via sharing of substructures. Thus, the CDD representation of F 
is likely to be much smaller than the explicit DBM-list representation. Further- 
more, sharing of identical substructures between CDD’s from different symbolic 
states may be obtained for free, opening for even more efficient storage-usage. 

Having implemented a CDD-package and used it in modifying Uppaal, we 
report on some very encouraging experimental results. For eight industrial ex- 
amples found in the literature, significant space-savings are obtained: the savings 
are in average 42% with moderate increase in run-time (in average an increase 
of 7%). 

To make the reachability algorithm of Figure 1 fully symbolic, it remains to 
show how to compute the successor set Next based on CDD’s. In particular, 
algorithms are needed for computing forward projection in time and clock-reset 
for this data-structure. Similar to the canonical form for DBM’s these operation 
are obtained via a canonical CDD form, where bounds on all arcs are as tight 
as possible. 

^ Thus D is simply unioned with F, when a new symbolic state {fD) is added to the 
PASSED-list (cf. Fig. 1, line (+)). 
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Related Work. The work in [Bal96] and [WTD95] represent early attempts of 
applying BDD-technology to the verification of continuous real-time systems. In 
[Bal96], DBM’s themselves are coded as BDD’s. However, unions of DBM’s are 
avoided and replaced by convex hulls leading to an approximation algorithm. In 
[WTD95], BDD’s are applied to a symbolic representation of the discrete control 
part, whereas the continuous part is dealt with using DBM’s. 

The Numerical Decision Diagrams of [ABKMPR97,BMPY97] offer a cano- 
nical representation of unions of zones, essentially via a BDD-encoding of the 
collection of regions covered by the union. [CC95] offers a similar BDD-encoding 
in the simple case of one-clock automata. In both cases, the encodings are extre- 
mely sensitive to the size of the in-going constants. As we will indicate, NDD’s 
may be seen as degenerate CDD’s requiring very fine granularity. 

CDD’s are in the spirit of Interval Decision Diagrams of [ST98]. In [Strehh98], 
IDD’s are used for analysis in a discrete, one-clock setting. Whereas IDD’s nodes 
are associated with independent real- valued variables, CDD-nodes - being asso- 
ciated with differences - are highly dependent. Thus, the subset- and emptiness 
checking algorithms for CDD’s are substantially different. Also, the canonical 
form requires additional attention, as bounds on different arcs along a path may 
interact. 

The ODD datastructure was first introduced in [LPWW98], where a thorough 
study of various possible normalforms is given. A similar datastructure has re- 
cently been introduced in [MLAH99a,MLAH99b]. 



2 Timed Automata 



Timed automata were first introduced in [AD94] and have since then established 
themselves as a standard model for real-time systems. We assume familiarity 
with this model and only give a brief review in order to fix the terminology and 
notation used in this paper. 

A timed automaton is a standard finite-state automaton extended with a 
finite collection of real- valued clocks. The nodes (often called (control) nodes) 
are labelled with an invariant. Transitions are labelled with a guards a clock reset 
and a synchronisation. Guards and invariants are clock constraints. Intuitively, 
a timed automaton starts execution with all clocks set to zero. Clocks increase 
uniformly with time while the automaton is within a node. The automaton can 
only stay within a node while the clocks fulfill the node’s invariant. A transition 
can be taken if the clocks fulfill the guard. By taking the transition, all clocks 
in the clock reset will be set to zero, while the remaining keep their values. 
Thus transitions occur instantaneously. Semantically, a state of an automaton 
is a pair of a control node and a clock valuation^ i.e. the current setting of 
the clocks. Transitions in the semantic interpretation are either labelled with 
a synchronisation (if it is an instantaneous switch from the current node to 
another) or with a positive time delay (if the automaton stays within a node 
letting time pass). 
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For the formal definition, we denote the clocks by C = {Xi, . . . and 

use B[C) ranged over by g and D to denote the set of clock constraint systems 
over C. 

Definition 1. A timed automaton A over clocks C is a tuple (X, Iq^E^I) where 
N is a finite set of nodes (control-nodes)^ Iq is the initial node^ E C N x B{C) x 
2^ X C corresponds to the set of edges ^ and finally^ f : N B{C) assigns 
invariants to nodes. In the case, {l^g^r^V) G X, we write I V , 

Formally, we represent the values of clocks as functions (called clock assignments) 
from C to the non-negative reals R> . We denote by V the set of clock assignments 
for C. A semantical state of an automaton A is now a pair (/,a), where I is a 
node of A and u is a clock assignment for C, and the semantics of A is given by 
a transition system with the following two types of transitions (corresponding 
to delay-transitions and edge-transitions): 

— (/, u) — ^(/, u-\- d) if and I[l){u d) 

— {lju ) — if there exist g^r such that I V ^ u ^ g^ E = [r ^ 0]u, 
1 (/)(n) and 1 {E){v/) 

where for d G u-\- d denotes the time assignment which maps each clock X 
in C to the value w(A) + d, and for r C C, [r 0]u denotes the assignment for 
C which maps each clock in r to the value 0 and agrees with u over C\r. By 
u E g we denote that the clock assignment u satisfies the constraint g (in the 
obvious manner). 

Clearly, the semantics of a timed automaton yields an infinite transition 
system, and is thus not an appropriate basis for decision algorithms. However, 
efficient algorithms may be obtained using a finite-state symbolic semantics 
based on symbolic states of the form (/, D), where D G B{C) [HNSY94,YPD94]. 
The symbolic counterpart to the standard semantics is given by the following 
two (fairly obvious) types of symbolic transitions: 

{1,{D Al{iy AI{1)) • {l,D)-^{l\r{gADAl{l))Al{0)i^l^l' 

where time progress = {u d\u <E D A d <E K.>} and clock reset r(D) = 
{[r 1 -^ 0]u I u G E}. It may be shown that B[C) (the set of constraint systems) is 
closed under these two operations ensuring the well-definedness of the semantics. 
Moreover, the symbolic semantics corresponds closely to the standard semantics 
in the sense that, whenever u E D and {l^D) ^ ^ then (/,w) — ^ (^^^0 

for some E E Df 



3 Clock Difference Diagrams 

While in principle DBM’s are an efficient implementation for clock constraint 
systems, especially when using canonical form only when necessary and minimal 
form when suitable, they are not very good at handling unions of zones. In this 




346 



G. Behrmann et al. 



section we will introduce a more efficient data structure for federations: clock 
difference diagrams or short CDD’s. A ODD is a directed acyclic graph with 
two kinds of nodes: inner nodes and terminal nodes. Terminal nodes represent 
the constants true and false, while inner nodes are associated with a type (i.e. 
a clock pair) and arcs labeled with intervals giving bounds on the clock pair’s 
difference. Figure 2 shows examples of CDD’s. 

A ODD is a compact representation of a decision tree for federations: take a 
valuation, and follow the unique path along which the constraints given by type 
and interval are fulfilled by the valuation. If this process ends at a true node, 
the valuation belongs to the federation represented by this ODD, otherwise not. 
A ODD itself is not a tree, but a DAG due to sharing of isomorphic subtrees. 

A type is a pair (i, j) where I < i < j < n. The set of all types is written T, 
with typical element t. We assume that T is equipped with a linear ordering □ 
and a special bottom element (0,0) G T, in the same way as BDD’s assume a 
given ordering on the boolean variables. By 2 we denote the set of all non-empty, 
convex, integer-bounded subsets of the real line. Note that the integer bound may 
or may not be within the interval. A typical element of 2 is denoted i. We write 
20 for the set 2 U {0}. 

In order to relate intervals and types to constraint, we introduce the following 
notation: [i) given a type (i, j) and an interval 1 of the reals, by j) we denote 
the clock constraint having type (i, j) which restricts the value of Xi — Xj to 
the interval i, (ii) given a clock constraint D and a valuation i;, by D[v) we 
denote the application of D to u, i.e. the boolean value derived from replacing 
the clocks in D by the values given in v. 

Note that typically we will use the notation jointly, i.e. i(i,j)(i;) expresses 
the fact that v fulfills the constraint given by the interval I and the type (i, j). 
This allows us to give the definition of a ODD: 

Definition 2 (Clock Difference Diagram). A Clock Difference Diagram 
(CDD) is a directed acyclic graph consisting of a set of nodes V and two functions 
type : V — > T and succ :V — >2-^^'^ such that 

— V has exactly two terminal nodes called True and False, where type (True) = 
type(False) = (0,0) and succ(True) = succ(False) = 0. 

— all other nodes n E V are inner nodes^ which have attributed a type type(n) G 
T and a Unite set of successors succ(n) = {(/i,ni),... ,(A,nj^)}, where 

^ ^ 

We shall write n ^ rn to indicate that (i, m) G succ(n). For each inner node n, 
the following must hold: 

— the successors are disjoint: for (i, m), (i^, m^) G succ(n) either (i,m) = 
(i^, m^) or i n = 0, 

— the successor set is an R-cover: UUI 3m.n m} = R, 

— the CDD is ordered: for ail m, whenever n ^ m then type(m) □ type(n) 

Further^ the CDD is assumed to he reduced^ i.e. 
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Fig. 2. Three example CDD’s. Intervals not shown lead implicitly to False 



— it has maximal sharing: for all n^m ^ succ(n) = succ(m) implies n = 

— it has no trivial edges: whenever n ^ m then J ^ 

— all intervals are maximal: whenever n rngn rn then li = I 2 or 

hyjh 

Note that we do not require a special root node. Instead each node can be 
chosen as the root node, and the sub-DAG underneath this node is interpreted 
as describing a (possibly non-convex) set of clock valuations. This allows for 
sharing not only within a representation of one set of valuations, but between all 
representations. Figure 2 gives some examples of CDD’s. The following definition 
makes precise how to interpret such a DAG: 

Definition 3. Given a ODD (U, type, succ), each node n £ V is assigned a 
semantics |n] C V, recursively defined by 

— [False] := 0, [True] := V, 

— |n] := {u G V I n — ^ m, i(type(n))(i;) = true,i; G n an inner node 

For BDD’s and IDD’s, testing for equality can be achieved easily due to their 
canonicity: the test is reduced to a pure syntactical comparison. However, in the 
case of GDD’s canonicity is not achieved in the same straightforward manner. 

To see this, we give an example of two reduced GDD’s in Figure 3(a) descri- 
bing the same set. The two GDD’s are however not isomorphic. The problem 
with GDD’s - in contrast to IDD’s - is that the different types of constraints in 
the nodes are not independent, but influence each other. In the above example 
obviously 1 < A < 3 and X = Y already imply 1 < F <3. The constraint on Y 
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Fig. 3. (a) Two reduced CDDT for the same zone, (b) A tightened CDD 



in the CDD on the right hand side is simply too loose. Therefore a step towards 
an improved normal form is to require that on all paths, the constraints should 
be the tightest possible. We turn back to this issue in the final section. 



4 Operation on CDD’s 

Simple Operations. Three important operations on CDD’s, namely union, 
intersection and complement, can be defined analogously to IDD’s. All use 
a function make node which for a given type t and a successor set S = 
will either return the unique node in the given CDD 
C = (C, type, succ) having these attributes or, in case no such exists, add a new 
node to the CDD with the given attributes. This operation - shown in Figure 4 

- is important in order to keep reducedness of the CDD. Note that using a 
hashtable to identify nodes already in F, makenode can be implemented to run 
in constant time. Note further that makenode itself uses an operation reduce 

- not given in this paper - which ensures that S itself is reduced, i.e. it has 
maximal sharing, no trivial edges and all intervals are maximal. Additionally, 

5 is required to be well-formed, i.e. all intervals must be disjoint and form an 
R-cover. 

Then union can be defined as in Figure 4. Intersection is computed by repla- 
cing “union” by “intersect” everywhere in the definition of the union operation, 
and additionally adjusting the base cases. The complement is computed by es- 
sentially swapping True and False nodes. ^ 

Prom constraint systems to CDD’s. The reachability algorithm of Uppaal 
currently works with constraint systems (represented either as canonical DBM’s 
or in the minimal form). The desired reachability algorithm will need to combine 

^ As for the BDD apply-operator, using a hashed operation-cache is needed to avoid 
recomputation of the same operation for the same arguments. 
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makenode(t, A): reduce(S) 

if (3n G U.type(n) = t A succ(n) = S) return n 
else U:=UU{n}// where n is a fresh node 

type := type U {n i-^ t}; succ := succ U {ni-^ A} 
return n 
endif 

union(ni, n 2 ):if ni = True or U 2 = True then return True 
elseif rii = False then return ri 2 
elseif ri 2 = False then return ni 
else if type(ni) = type(u 2 ) then 

return makenode(type(ni), {(/i Pi / 2 ,union(ni, n^)) | 

rii -A n'l, n-2 -A n'a, 0}) 

elseif type(ni) □ type(u 2 ) then 

return makenode(type(ni), {(/i, union(ni, n 2 )) 
elseif type(n 2 ) C type(ni) then 

return makenode(type(n 2 ), {(^ 2 , union(ni, n^)) 
endif 
endif 

makeCDD(D): n := True 

for t G T \ {(0, 0)} do // use ordering □ 

1 ■■= ^U(t) 

if / / R then 

if lo[I) = 0 then n := makenode(t, {(7, n), (hi(7), False)}) 
elseif hi[I) = 0 then n := makenode(t, {(/, n), (/o(7), False)}) 
else n := makenode(t, {(7, n), (hi(7), False), (^o(7), False)}) 
endif 
endif 
endfor 
return n 

subset(74, n): if D = false or n = True then return true 

elseif n = False then return false 
else return /\ j subset (D A /(type (n)), m) 

n — >m 

endif 



I «■! A}) 

I W-2 — ^ A}') 



Fig. 4. Algorithms 



and compare DBM’s obtained from exploration of the timed automaton with 
CDD’s used as a compact representation of the PASSED-list. 

For the following we assume that a constraint system D holds at most one 
simple constraint for each pair of clocks Xi^Xj (which is obviously true for 
DBM’s and the minimal form). Let /7(i,j) be the set of all simple constraints 
of type (i,7), i.e. those for Xi — Xj and Xj — Xi. The constraint system /7(i, j) 
gives an upper and/or a lower bound for Xi — Xj. If not present, choose — cx> as 
lower and Too as upper bound. Denote the interval defined thus by 
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Further, given an interval I £ let lo[I) := {r G M | Vr^ G I.r < r^} 

be the set of lower bounds and hi{I) := {r G M | Vr^ G i.r > r^} the set of 
upper bounds. Note that always G 20. Using this notation, a simple 

algorithm makeCDD for constructing a ODD from a constraint system can be 
given as in Figure 4. Using this, we can easily union zones to a ODD as required 
in the modified reachability algorithm of Uppaal (cf. footnote 2). Note that for 
this asymmetric union it is advisable to use the minimal form representation for 
the zone, as this will lead to a smaller ODD, and subsequently to a faster and 
less space-consuming union-operation. 

Crucial Operations. Testing for equality and set-inclusion of CDD’s is not 
easy without utilizing a normal form. Looking at the test given in (1) it is 
however evident that all we need is to test for inclusion between a zone and a 
ODD. Such an asymmetric test for a zone Z and a ODD n can be implemented 
as shown in Figure 4 without need for canonic ity. 

Note that when testing for emptiness of a DBM as in the first if-statement, 
we need to compute its canonical form. If we know that the DBM is already 
in canonical form, the algorithm can be improved by passing D A i(type(n)) in 
canonical form. As D A I(type( n)) adds no more than two constraints to the 
zone, computation of the canonical form can be done faster than in the general 
case, which would be necessary in the test D = true. 

The above algorithm can also be used to test for emptiness of a ODD using 
empty(n) := subset(true, complement(n)), where true is the empty set of con- 
straints, fulfilled by every valuation. 

As testing for set inclusion C\ C C 2 of two CDD’s Ci,C 2 is equivalent to 
testing for emptiness of Ci Pi C 2 , also this check can be done without needing 
canonicity. 

5 Implementation and Experimental Results 

This section presents the results of an experiment where both the current^ and 
an experimental CDD-based version of Uppaal were used to verify eight indu- 
strial examples found in the literature - including a gearbox controller [LPY98], 
various communication protocols used in Philips audio equipment (see [BPV94], 
[DKRT97], [BGK+96]), and in B&O audio/video equipment [HSLL97,HLS98], 
and the start-up algorithm of the DACAPO protocol [LPY97] - as well as Fi- 
scher’s protocol for mutual exclusion. 

In Table 1 we present the space requirements and runtime of the examples 
on a Sun UltraSPARC 2 equipped with 512 MB of primary memory and two 170 
MHz processors. Each example was verified using the current purely DBM-based 
algorithm of Uppaal (Current), and three different CDD-based algorithms. The 
first (CDD) uses CDD’s to represent the continuous part of the PASSED-list, the 
second (Reduced) is identical to CDD except that all inconsistent paths - i.e. 

^ More precisely Uppaal version 2.19.2, which is the most recent version of Uppaal 
currently used in-house. 




Efficient Timed Reachability Analysis Using Clock Difference Diagrams 351 



Table 1. Performance statistics for a number of systems. P is the number of processes, 
V the number of discrete variables, and C the number of clocks in the system. All times 
are in seconds and space usage in kilobytes. Space usage only includes memory required 
to store the PASSED-list. 



System 


P V C 


Current 
Time Space 


CDD 

Time Space 


Reduced 
Time Space 


CDD+BDD 
Time Space 


Philips 


4 


4 


2 


0.2 


25 


0.2 


23 


0.2 


23 


0.35 


94 


Philips Col 


7 


13 


3 


21.8 


2,889 


23.0 


1,506 


28.8 


1,318 


70.6 


5,809 


B&O 


9 : 


22 


3 


56.0 


5,793 


55.9 


2,248 


63.4 


2,240 


300.2 


4,221 


BRP 


6 


7 


4 


22.1 


3,509 


21.3 


465 


46.5 


448 


68.9 


873 


PowerDownI 


10 : 


20 


2 


81.3 


4,129 


79.2 


1,539 


82.6 


1,467 


164.7 


4,553 


PowerDown2 


8 : 


20 


1 


19.3 


4,420 


19.8 


4,207 


19.7 


4,207 


79.5 


5,574 


Dacapo 


6 


12 


5 


55.1 


4,474 


57.1 


2,950 


64.5 


2,053 


256.1 


6,845 


Gearbox 


5 


4 


5 


10.5 


1,849 


11.2 


888 


12.35 


862 


29.9 


7,788 


Fischer4 


4 


1 


4 


1.14 


129 


1.36 


96 


2.52 


48 


2.3 


107 


Fischer5 


5 


1 


5 


40.6 


1,976 


61.5 


3,095 


154.4 


396 


107.3 


3,130 



those representing the empty set - are removed from the CDD’s, and the third 
(CDD+BDD) extends CDD’s with a BDD-based representation of the discrete 
part in order to achieve a fully symbolic representation of the PASSED-list. As can 
be seen, our CDD-based modification of Uppaal leads to truly significant space- 
savings (in average 42%) with only moderate increase in run-time (in average 
7%). When inconsistent paths are eliminated the average space-saving increases 
to 55% at the cost of an average increase in run-time of 54%. If we only consider 
the industrial examples the average space-savings of ODD are 49% while the 
average increase in run-time is below 0.5%. Maybe unexpectedly, CDD+BDD 
when compared with Current leads to a degraded performance in both time 
and space. Additionally, a closer look at the usage of the Wait- list reveals that 
the less strict termination condition of (1) only in a few cases leads to faster 
termination. This offers a good explanation for the lack in runtime- improvement. 



6 Towards a Fully Symbolic Timed Reachability Analysis 

The presented CDD-version of Uppaal uses CDD’s to store the PASSED-list, 
but zones (i.e. DBM’s) in the exploration of the timed automata. The next goal 
is to use CDD’s in the exploration as well, thus treating the continuous part 
fully symbolic. In combination with the suggested BDD-based approach for the 
discrete part, this would result in a fully symbolic timed reachability analysis, 
saving even more space and time. 

The central operations when exploring a timed automaton are time progress 
and clock reset. Using tightened CDDU, these operations can be defined along 
the same lines as for DBM’s. A tightened CDD is one where along each path to 
True all constraints are the the tightest possible. In [LPWW98] we have shown 
how to effectively transform any given CDD into an equivalent tightened one. 
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Figure 3(b) shows the tightened CDD-representation for example (b) from 
Figure 2. Given this tightened version, the time progress operation is obtained 
by simply removing all upper bounds on the individual clocks. In general, this 
gives a ODD with overlapping intervals, which however can easily be turned into 
a ODD obeying our definition. More details on these operations can be found in 
[LPWW98]. 

CDD’s come equipped with an obvious notion of being equally fine parti- 
tioned. For equally fine partitioned CDD’s we have the following normal form 
theorem [LPWW98]: 

Theorem 1 . Let Ci^C2 be two CDD^s which are tightened and equally fine 
partitioned. Then |Ci] = IC2] iffCi and C2 are graph-isomorphic. 

A drastic way of achieving equally fine partitioned CDD’s is to allow only 
atomic integer-bounded intervals, i.e. intervals of the form [n, n] or (n, n+l). This 
approach has been taken in [ABKMPR97,BMPY97] demonstrating canonicity. 
However, this approach is extremely sensitive to the size of the constants in the 
analyzed model. In contrast, for models with large constants our notion of ODD 
allows for coarser, and hence more space-efficient, representations. 

7 Conclusion 

In this paper, we have presented Clock Difference Diagrams, CDD’s, a BDD-like 
data-structure for effective representation and manipulation of finite unions of 
zones. A version of the real-time verification tool Uppaal using CDD’s to store 
explored symbolic states has been implemented. Our experimental results on 
eight industrial examples found in the literature demonstrate significant space- 
savings (in average 42%) with a moderate increase in run-time (in average 7%). 
Currently, we are pursuing realization of the fully symbolic state-space explora- 
tion of the last section and [LPWW98], extending Uppaal from pure reachability 
checking to checking for general real-time properties. 
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Abstract. A proof-theoretic mechanized verification environment that 
allows taking advantage of the “convenient computations” method is pre- 
sented. The PV S theories encapsulating this method reduce the concep- 
tual difficulty of proving a safety or liveness property for all the possible 
interleavings of a parallel computation by separating two different con- 
cerns: proving that certain convenient computations satisfy the property, 
and proving that every computation is related to a convenient one by a 
relation which preserves the property. We define one such relation, the 
equivalence of computations which differ only in the order of independent 
operations. We also introduce the computation as an explicit semantic 
object. The application of the method requires the definition of a “mea- 
sure” function from computations into a well-founded set. We supply 
two possible default measures, which can be applied in many cases, to- 
gether with examples of their use. The work is done in PVS^ and a clear 
separation is made between “infrastructural” theories to be supplied as 
a proof environment library to users, and the specification and proof of 
particular examples. 



1 Introduction 

This paper presents a proof environment for PV S [13,17,12] that supports con- 
venient computations and exploits partial order based on the independence of 
operations in different processes, for the first time in a mechanized theorem- 
proving context. Thus theoretic work defining this approach [8,9,11] is turned 
into a proof environment for theorem-proving that can be used without having to 
rejustify basic principles. Besides making convenient computations practical in a 
theorem-proving tool, we demonstrate what is involved in packaging such a fra- 
mework into a proof environment for use by nonexperts in the particular theory. 
The modular structure of the theories (the units of PVS code that contain defini- 
tions and theorems) should encourage using parts of the environment whenever 
convenient computations are natural to the problem statement or proof. 

In the continuation, basic theories are described in which computation se- 
quences are viewed as ffirst-class’ objects that can be specified, equivalence of 
computations based on independence of operations is precisely defined, and a 
proof method using measure induction over well-founded sets is encapsulated. 
Two possible default measures are provided to aid the user in completing the 
proof obligations, one involving the distance between matching pairs of events, 
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and one for computations that have layers of events. As an example among those 
classes of properties that can be proven with the convenient computations me- 
thod, (those properties preserved by the chosen reduction relation), we show a 
subclass of the stable properties: final-state properties. 

We summarize how a user can exploit the environment and describe two 
generic examples. The first demonstrates ideas of virtual atomicity for sequences 
of events local to a single process, and the second shows the use of the layers 
measure for a pipelined implementation of insertion sorting. In this and the other 
examples we have done, the proof environment (infrastructural theories) contains 
over half of the lines in the PV S specification files and also of the interactive 
PVS prover commands, taken as a rough indication of the proof effort. 



Convenient Computations and the Need for Mechanization 

Methods that exploit the partial order among independent operations have been 
used both for model checking and for general theorem proving (see [16] for a va- 
riety of approaches). In particular, ideas of the independence of operations that 
lead to partial order reductions have either been used for (usually linear) tem- 
poral logic based model checking reductions [14,18,7], or for theoretical work on 
general correctness proofs in unbounded domains. [11,15,8]. For general correc- 
tness (as opposed to model checking) no mechanization has been implemented 
until now, and sample proofs have been hand simulated. 

The intuitive idea behind convenient computations is simple. A system de- 
fines a collection of linear sequences of the events and/or states (where each 
sequence is called a computation). We often convince ourselves of the correc- 
tness of a concurrent system by considering some “convenient” computations 
in which events occur in an orderly fashion even though they may be in dif- 
ferent processes. It is usually easier to prove properties for these well-chosen 
computations than for all the possible interleavings of parallel processes. Two 
computations are called equivalent if they differ only in that independent (poten- 
tially concurrent) events appear in a different order. There are classes of safety 
and liveness properties which are satisfied equally by any two equivalent compu- 
tations (i.e., either both satisfy the property, or neither does). If we show that 
any non-convenient computation is equivalent to some convenient one, then we 
can conclude that any properties of this kind verified for the convenient compu- 
tations must also be satisfied by the non-convenient ones. 

In certain contexts, like sequential consistency of memory models and seria- 
lizability of database transaction protocols, where the convenient computations’ 
behavior is taken as the correct one by definition, the computation equivalence 
itself is the goal of the verification effort. Even when the goal is to prove certain 
properties of a system, if we attempt to reduce the problem to verification of the 
convenient computations, we might find flaws in our intuitive belief that they 
“represent” all possible computations. Finding that some computations are not 
equivalent to any convenient one can have as much practical value as finding a 
counterexample to a property expressed by a logical formula. 
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The availability of general purpose theorem proving tools such as PV S opens 
the way for a mechanized application of the convenient computations technique. 
Usually, attempts to carry out mechanized proofs in such tools raise issues that 
might be overlooked when using “intuitive” formal reasoning. Moreover, proofs 
can be saved and later, in the face of change, adjusted and rerun rather than 
just discarded. The down side of mechanized theorem proving methods is the 
need to prove many facts that are easily understood and believed to be correct 
by human intuition. Many of these facts are common to all applications of a 
proof approach. General definitions and powerful lemmas can be packed in theo- 
ries that provide a comfortable proof environment. These theories also clarify 
the new approach, and generate the needed proof obligations for any particular 
application. The proof obligations arise as “importing assumptions” of generic 
theories, as “type checking conditions” when defining objects of the provided 
types, or as antecedents (preconditions) in the provided theorems that have the 
form of a logical implication. 



Existing Versus Proposed Verification Styles 

The PV S tool is a general-purpose theorem prover with a higher-order logic 
designed to verify and challenge specifications, and is not tailored to any com- 
putation model or programming language. It does provide a prelude of theories 
about arithmetic, inequalities, sets, and other common mathematical structures, 
and some additional useful libraries. Decidable fragments of logic are treated by 
a collection of decision procedures that replace many tedious subproofs. Howe- 
ver, it has no inherent concept of states, operations, or computations. Usual 
verification examples, like proving an invariant in a transition system, involve 
the definition of states, initial conditions (initial-state functions), and transitions 
(next- state functions) by the user. To prove invariance of a state property P, one 
just writes and proves an induction theorem of the form: 

{initial{s) ^ P{s)) A {P{s) ^ P{next{s))) 

The computations themselves are not mentioned directly, and the property “in 
every state, P” (DU of linear temporal logic) is justified (usually implicitly) by 
such an induction theorem. 

As part of the proof environment presented here, we provide precise defi- 
nitions for computations, conditional independence of operations, computation 
equivalence, and verification of properties based on computation equivalence and 
convenient computations using well-founded sets. 

We define a “computation” type as a function from discrete time into “steps” 
(a state paired with an operation), and specify temporal properties as arbitrary 
predicates over computations. Thus, if c : comps is a function from t : time to 
steps, and st is a projection function from a step to its state component, then 



□U = globaPPl[c : comps) : bool = Vt G time : P{st{c{t))) 
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This style is needed so we can reason about computations and their relationships. 
Note that we are not limited to linear-time properties by this style of expression. 
The higher-order logic of PV S allows us for example, to express a CTL* formula 
like “GEFp” by means of a predicate on computations: 

GEFp[c : comps) : boolean = 

V(t : time) :3{d : comps) : V(tp : time\tp < t) : d(tp) = c(tp) 

Ad(t/ : time\tf > t) : p[st[d(tf))) 

or in words, “for every time point t there exists a computation (called d) which 
is identical to c before time t and, at time t or later, has a state that satisfies p” 

2 The Theories 

In this section we describe a hierarchy of theories, whose IMPORTING relations- 
hips can be seen in Figure 1. They provide the foundation for reasoning about 
equivalence among computations. The top level of the hierarchy contains three 
main components: the computation models the equivalence notion, and the proof 
method. An additional default measure component uses the other theories in 
specific contexts that hide some of the proof obligations in an application. 

In the computation model component, transition systems and computations 
over them are defined. The option of providing global restrictions on possible 
sequences of transitions (for example, in order to introduce various notions of 
fairness [6,2]) is also provided. In the equivalence component, theories are pre- 
sented that encode when transitions are independent, and when computations 
are defined to be equivalent (in that independent transitions occur in a different 
order). The proof method component shows how to prove that for an arbitrary 
set and subset, every element of the set is related to some element of the sub- 
set, using well-foundedness. Here the elements are arbitrary, and the relation 
is given as a parameter. When instantiated with the equivalence relation from 
the equivalence component, the needed proof rule and its justification are pro- 
vided. As an example of the classes of properties relevant to this method, we 
include a theory that defines the “final-state properties” and proves that they 
are preserved by the defined computation equivalence relation. 

After presenting these theories in somewhat more detail, two default measu- 
res are described, for matching pairs of operations and for layered computations. 
We then summarize how a user should apply the theories to an application, and 
describe two examples. 

(Note: The PV S files for the proof environment and the examples are available 
from the Web page at http : //www. cs . technion. ac . il/'^marce). 

2.1 Computation Model 

Our model of computation is defined by three parameterized theories: 
step.sequences , execution_sequences, and computations. Each application 
using these theories must define types for the specific states and the operations as 
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Fig. 1. The hierarchy of theories. 



actual parameters upon instantiation of the theories. The theory step_sequences, 
based on these two types, defines function types needed to build a transition sy- 
stem: initial-conditions, enabling-conditions, and next_state_functions. 
It also defines the types time, steps (records with two fields: st : states and 

op: ops), and step_seqs (functions from time to steps). 

In an application, the user defines the initial states, the enabling conditi- 
ons and the next-state functions for each operation, and then instantiates the 
theory execution-sequences. This theory defines the subtype of the well-built 
execution-sequences: the ones that start in an initial state, and whose steps 
are consecutive, i.e., the operations are enabled on the corresponding state, and 
the state in the following step is their next-state value. 

The theory computations has an additional parameter to be provided by a 
user, namely, a predicate on execution-sequences called relevant? which is 
used to define the subtype comps. This includes only those execution-sequences 
which satisfy the added predicate. This restriction can be used to focus on a spe- 
cial subset of all the possible execution sequences, for example to express fairness 
assumptions or to analyze just a part of the system’s behavior (such as a finite 
collection of transactions) . 

2.2 Computation Equivalence 

Equivalence between computations and independence of operations is forma- 
lized by the theories conditional-independence and equivalence-of-comps. 





Mechanizing Proofs of Computation Equivalence 



359 



The functional independence defined in the first of these theories is over pairs 
of operations and states, expressing when two operations are independent in a 
given state. It requires that the execution of either of the two operations doesn’t 
interfere with the other’s enabledness, and that the result of applying both ope- 
rations from the given state must be the same regardless of their ordering. 

Though this functional independence expresses commutativity of operations, 
it is not practical to prove it each time we need to show that a pair of consecutive 
operations can be exchanged. To separate this (local) consideration and make 
later proofs simpler, we allow the user to define a separate conditional indepen- 
dence relation also over pairs of operations and states. This predicate must be 
symmetric and it must imply the functional independence of the two operations 
from the given state. (These conditions will appear as proof obligations when the 
theory is instantiated.) This arrangement allows the user to choose how much 
independence is to be considered for a particular application. 

The theory equivalence _of_comps first defines the result of swapping two 
independent operations on a given state in an execution sequence. If the need 
arises to prove that the result is a legal computation (a relevant? execution 
sequence), it is passed as a proof obligation to the application since relevant? is 
only defined there. The rest of the theory deals only with legal computations 
that are identical up to the swapping of independent operations, defining: 

— one -Swap _equiv? (cl , c2) : cl and c2 are different and differ by a single swap, 
i.e., c2 is the result of swapping consecutive independent operations in cl at 
some time t. 

— swap_equiv_n?(cl , c2, n) : cl and c2 differ by up to n single swaps. 

— swap_equiv?(cl , c2) : this is the transitive closure of one_swap_equiv? and 
is true iff there is an n s.t. swap_equiv_n? (cl , c2, n) . 

In the theory, the relation swap.equiv? is proven to be an equivalence re- 
lation. This relation is the formalization of the intuitive notion of equivalent 
computations, and the equivalence classes that it generates in the set of all com- 
putations are called interleaving sets in the context of partial order reductions 
and the temporal logic JS'TL* [9, 10]. 

2.3 Proof Method 

Consider an arbitrary set (or data type) T, with a preorder relation path_to? 
over its elements, and choose a subset of T - those elements which satisfy a 
given predicate. We want to prove that from each element in T we can reach 
one in the chosen subset. We first pick a “measure function” which maps ele- 
ments from T into elements of a well-founded structure (M,<). In the theory 
subset -reachability we show that it suffices to prove that each element out- 
side the chosen subset has a path to one with a strictly smaller measure. 

The theory conv-comps has parameters that define a computation model, 
a reduces-to? preorder, a predicate for choosing the conv?enient computati- 
ons, and a measure function into a well-founded set. These are used with the 
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subset -reachability theory to provide a sufficient condition: 

Vc : -icont??(c) 3d : reduces Jo? (c^d) Am{d)<m{c) (1) 

from which reduction to convenient computations is proved: 

Jc :3d : conv?[d) A reduces Jo? {c^ d) 

It also provides a theorem defining the two added proof obligations that must 
be discharged in an application to verify any property p? for all computations: 

Vc : conv?[c) P?(c) 



Jc,d: reduces Jo? {c,d) {p?{d) P?(c)) (2) 

In other words, p? must be true for the convenient computations, and must 
respect the preorder used in the theory. In a wider context, the theory of con- 
venient computations can be used to reduce the verification of properties of 
general computations to the simpler problem of verification over the conveni- 
ent computations. The reduces_to? relation can be any preorder for which the 
required premises ((1) and (2)) can be proven. Since the theory is parametric, 
other computation models and notions of equivalence can be used, besides those 
seen here. 



2.4 Property Clsisses 

Any property preserved by the relation (preorder) chosen as a reduction to con- 
venient computations, is a candidate to be verified by this method. A common 
example is that of stable properties. The theory final -State .properties exem- 
plifies a special case of stable properties. It defines a final state of a computation 
as any point in time after which the computation remains quiescent i.e. every 
operation-state pair is the same as the next one. A function is defined that, given 
a state property, generates a computation predicate that enforces that state pro- 
perty on all final states. The “final-state properties” thus generated are proven 
invariant under the swap-equivalence relation. 



2.5 Default Mesisures 

The choice of measure functions should address the intuitive notion of “how 
close” a computation is to a convenient one. (e.g. how many independent ope- 
ration pairs should be swapped). Only then will the proof obligations generated 
be easy (if not trivial) to discharge. We provide theories with two measures that 
widen the support given to the user of the method. 
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Matching intervals measure: In [8] the convenient computations method was 
applied (manually) to the sequential consistency problem. The measure involved 
intervals of selected events (computation steps) and their length. The measure 
value was lowered by moving unrelated events out of the interval until all the 
selected events happened consecutively. We provide a simpler version which can 
be applied to achieve the same effect. An interval is defined as a pair of points 
in time (tl,t2), and its distance (length) is t2 — tl — 1 (thus a consecutive pair 
(t, t + 1) has distance zero). The measure value for a computation is defined as 
the sum of all the distances of its matching intervals. 

To use this measure, the application must supply a predicate match?(c,i) 
that defines the “matching” intervals i (pairs of points) in a given computation 
c. In a matching interval we want two events to ideally happen immediately 
one after the other, in a certain order, even if in many computations there are 
intervening events. Typical cases are sending and receiving a value over an empty 
communication channel, or performing a series of local steps in a process. The 
minimum value is attained when all the matching intervals have zero distance. In 
a reasonable application of the method, the definition of the matching intervals 
should make it easy to prove that nonconvenient computations have a nonzero 
measure. The match? predicate must satisfy the following requirements: 

— Every computation has finitely many matching intervals. This is to make the 
measure finite. (An alternative would be to require that the set of nonzero- 
distance matching intervals be finite, and sum distances only over that set.) 

— The matching intervals in two one-swap-equivalent computations are the 
same, up to the exchange of the end-points affected by the swap. 

— No two matching intervals start in the same time point and no two end 
together. This is used to simplify the number of cases. 

— Swappable (i.e., independent consecutive) operations cannot appear at the 
ends of a (zero-distance) matching interval. 

These requirements mainly restrain the choice of the match? function to a usable 
one. Again, for reasonable choices their proof is straightforward. 

The theory also provides and proves a heuristic for finding a computation d 
which is equivalent to a given computation c and has a smaller measure. Such a 
d exists if c satisfies that for some t: 

[only starts -interval? [c^t) A ^only starts -interval? [c^t -\- 1)) V 

[only -ends -interval? t-\- 1) A ^only -ends -interval? t)) 

where only starts -interval? (c, t) = starts -inter val?{c^ t) A— ^ends -interval? t) 
(and only -ends -interval? is defined similarly). 

The predicates starts_interval? (c,t) and ends_interval? (c,t) state that 
there is a matching interval in c that starts(ends) at time t. This means that 
either an event only starting an interval is followed by one not only starting an 
interval, or an event only ending an interval is preceded by one not only ending 
an interval. Due to the other assumptions, if this holds, the relevant pair can be 
exchanged, yielding a computation with a smaller measure. 
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Layered mesisure Another way of thinking of convenient computations of a 
program is to define ordered phases or layers of execution [5,4]. Each event 
is associated with a layer. If the events in every layer appear contiguously in 
the computation, without events from a layer getting mixed with events from 
an earlier layer, the computation is considered convenient. Examples where this 
approach seems natural are programs with communication-closed layers and dis- 
tributed snapshot algorithms [3]. In contrast to some of those previous works, 
however, we do not focus on syntactic layers: the same program instruction oc- 
curring more than once might produce events belonging to different execution 
layers. 

The layeredjueasure theory considers programs with a finite number of 
layers, where all but the last one must be finite and eventually finish, i.e., for 
each computation and for each layer in it, there is a time after which all the 
events belong to other layers. If infinite computations are considered, this can 
be achieved by applying some sort of fairness assumption. 

Eor each event (computation step) except those associated with the last layer, 
we count the number of previous events that belong to a (strictly) later layer than 
the layer of that event. The measure value of a computation is the sum of those 
counts. Clearly, computations with a zero measure value should be convenient 
in an application, since no event is preceded by an event from a later layer. 

The application must define a natural number last layer and function (layer) 
that maps a computation and a time point into a natural number less than 
or equal to last layer. This function must meet the following requirements: 

— As mentioned before, for every layer below lastlayer there is a time after 
which there are no more time points belonging to it. Proofs of this require- 
ment are based on basic progress of the computation, which can be supported 
by fairness assumptions from the relevant? predicate. 

— The layer function is the same for one-swap-equivalent computations, ex- 
cept at the two time points involved in the swap, where the layer values are 
interchanged. This is trivial for reasonable definitions of the layer function. 

— Eor any time t where layer{t) > layer {t + 1), the operations at t and t T 1 
must be independent, i.e. a swap must be possible. This seemingly strong 
requirement is easy to prove if layering is appropriate for the application. 

In this theory, it is proven that any assignment of layers satisfying these condi- 
tions guarantees that any computation in which a later-layer event comes before 
a previous-layer event (and thus having a non-zero measure) is equivalent to one 
with a smaller measure. Thus, showing a drop in the measure value is hidden 
from a user, if the three conditions above can be shown. 

3 Using the Method: A Summary 

3.1 The User’s Problem Description 

Eirst, the computation model must be described by defining the types of the 
states and operations, the initial states, the operations’ enabling conditions, and 
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their next-state functions. Any necessary global restrictions such as fairness or 
finiteness are then added to define the relevant computations. 

Second, the user must define the conditional independence relation between 
the operations at given states. This is used to instantiate the computation 
equivalence theory which will provide the swap-equiv relation. The theory will 
generate proof obligations to show that the user’s suggested relation is a valid 
independence relation. Finally, in the proof method theories, the convenient 
computations must be provided in the instantiation of the theory conv-comps, 
and a measure function must be defined (either by the user, or using one of the 
two provided). 

These are all the definitions needed to prove computation equivalence. Aside 
from the importing assumptions of the theories used, the user is left to prove 
that for every non-convenient computation there is a reduction to a computation 
with a lower measure (for the two default measures provided there is a sufficient 
condition that makes that proof much easier). 

To prove any property (predicate over computations) for all the computations 
of an application, the theorem provided in conv.comps leaves the user to prove 
that the property holds for convenient computations and that equivalence over 
the user’s independence relation preserves the property. 

3.2 The User’s Design Decisions and Tradeoffs 

As in any proof method, experience is essential in successfully applying the ele- 
ments of this method. Choosing the relevant computations can be critical, espe- 
cially in proving the importing assumptions of the theories that define measure 
functions. 

When proving that the reduction preserves the property to be verified, and 
also when proving that the independence relation implies functional indepen- 
dence, it helps to have as small an independence relation as possible. This con- 
flicts with the interest of having more opportunities to swap operations in order 
to And a computation with a smaller measure. 

If we include more computations in the class of convenient computations, 
it may be easier to show a reduction to a smaller measure for the remaining 
nonconvenient computations. On the other hand, we reduce the benefit of the 
use of equivalence by having to prove the desired properties directly for a larger 
class of convenient computations. 

As seen in the proof obligation (2), the properties that can be verified when 
the theories are combined in an application are those which are preserved by 
the reduction relation. A lemma in the theory equivalence _of_comps simplifies 
this requirement: it suffices to show that two computations which differ only in 
the order of one pair of independent operations, must satisfy p? equally: 

Mc^d : one swap -equivl[c^d) pl[d) P?(c) (3) 

This requirement is easy to prove for large classes of properties, e.g., those 
defined in the theory final .state .properties. 
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In certain cases, one might need to add “history” variables to the state, 
(without affecting the behavior of the rest of the state components) to support 
property verification. For example, in order to verify mutual exclusion, a flag that 
records a violation of the mutual exclusion should be added. This is done so that 
two computations which differ only by the order of a pair of operations are not 
considered equivalent if one of them violates the mutual exclusion requirement 
and the other does not. The original system variables might not suffice to make 
those operations functionally dependent. 

The characterization of the properties which can be proven by this method 
is a subject worth further research. In this paper we have focused on the pro- 
ofs that computations are equivalent, and particularly on showing that every 
computation is equivalent to one of the convenient computations. 



4 Example 1: Using the Matching Intervals Measure 

Our first example (a full listing is at the Web page given earlier) shows how a 
sequence of local actions in a process can be considered atomic. It is typical of 
many situations where a sequence of local actions can be viewed as virtually 
atomic [1]. 



flag: bool=FALSE tl,tm,x: 


nat 






10: 


tl=l 


7. local II PM: 


mO : 


tm=2 


7o local 


11: 


x=tl 


7. global 1 1 


ml : 


await flag=TRUE 




12: 


flag=TRUE 


1 1 


m2 : 


x=x+tm 


% global 


13: 


STOP 


1 1 


m3 : 


STOP 





Here the operation 12 must occur before ml, so in fact we observe all the 
possible interleavings of the operation mO (PM’s initialization) with the opera- 
tions 10-12. The states type contains the two program counters explicitly. The 
ops type is {10,11,12, m0,ml,m2, stop}. The initial? predicate on states is 
straightforward. The en? enabling condition, and the next next-state-function 
are defined in table format to enhance readability. 

In this example we define two operations as independent if they are both 
stop or if they belong to different processes and satisfy indep.ljn?, a predicate 
given in tabular form. The table’s rows and columns represent operations which 
belong to different processes, and the entries are state predicates, though in this 
particular case they are not state-dependent (always TRUE or FALSE). This 
table was filled based on our understanding of the semantics of the program- 
ming language. The independence relation must be proved to imply functional 
equivalence and to be symmetric, as a type-correctness requirement. After that 
is done, to decide if two operations can be swapped, we only need to look them 
up in the table. The convenient computations are chosen as those in which mO 
is executed immediately before ml (and after 10-12). We choose to use the de- 
fault measure with matching pairs. Here we can define the match? predicate 
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so that only the pair (mO,ml) matches. Clearly, when the measure is zero, the 
computation is convenient. 

The proof that for any nonconvenient computation there is an equivalent one 
with a smaller measure was accomplished by using the theorem provided in the 
matching_intervals jneasure theory. Note that if the instructions in question 
were in a loop, the definition of “matching intervals” would have to guarantee 
that the proper occurrences of the instructions are matched, e.g., by using a 
loop counter as well as the operations. There is another condition: computations 
must have a finite number of matching intervals. In the present example, this is 
easy to show since each operation is done exactly once. In general, this would be 
proven by using some kind of finiteness constraint, typically from the relevant? 
predicate. 

Although our main concern is proving computation equivalence, we show 
the remaining proof obligations for a final-state property. The proof obligation 
(conv_implies_p) shows that p holds for the convenient computations and is not 
completed here. The other obligation (one_swap_equiv_preserves_p) is easily 
discharged by invoking a theorem from the theory final .state .properties. 

5 Example 2: Using the Layered Measure 

Our second example (also available at the Web page) is a typical representative 
of the pipelined processing paradigm. In our example, all computations are equi- 
valent to those that execute “one wave at a time,” i.e., in which a new input is 
entered only when all the operations related to the previous inputs have been 
finished. The program is a pipelined insertion-sort algorithm in which the buffers 
between the processors can hold a single value. We assume that each processor 
does its local actions atomically: taking its input, comparing it with the value 
it holds, and sending the maximum between them to the next processor in the 
pipeline. To understand why it is complicated to prove that the algorithm cor- 
rectly sorts the inserted values without the convenient computations approach, 
consider a general computation. In a typical state, the k first processors already 
have a value, and some of them have a nonempty incoming buffer. There could 
be several such processors whose successor has room in its buffer, so many diffe- 
rent operations would be enabled in such a state. To verify the sorting algorithm 
we need a general invariant, much harder to find than the one needed if we only 
have to consider convenient computations in which there is at most one possible 
continuation at a time. 

The example is described in the theory pipeline .sort, parametric on the 
type of the values being sorted, their total-ordering relation, the number of input 
values (and of processors) NUM, and an array from 0 to NUM-1 holding those values. 

The processors’ indices range from 1 up to NUM. Since we choose to use the 
layers approach, we augment the state variables and next-state functions to allow 
defining the layer value of each computation step. The system state includes a 
counter of the number of inputs already inserted, and an array of processor 
states. Each such processor state includes a locally held value, an input buffer. 
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and an integer (input .layer) that holds the layer associated with that input. 
This number is taken from the global counter when inputting a new value into 
the first processor in the pipeline, and is copied to the next processor when a 
value is propagated forward, regardless of the result of the comparison between 
the input and the locally held value. 

The layer value of an “input-new-value” operation is the value of the global 
input counter. For a normal computation step by any processor, the layer value 
is the input.layer stored in that processor’s state. Since it originated from 
the global counter’s value when the layer began, this value ranges from 0 up to 
NUM-1. The idling operation, enabled only at the end of the whole computation, 
has a layer value NUM. 

The initial states, enabling conditions and next-state-functions are coded 
in a straightforward way. In this case, we imposed no added restrictions when 
describing the relevant? computations. 

The independence relation is defined as TRUE only between operations done 
by non-adjacent processors (and for two idling steps). This simplified relation is 
much easier to use during the proofs than the functional independence relation. 

These are the (nontrivial) proof obligations generated after instantiating all 
the needed infrastructural theories with the above mentioned definitions: 

— The user’s independence relation implies functional independence and is 
symmetric. Proving this requires only local reasoning. 

— Each layer eventually ends. To prove this we used sublemmas that show 
eventual progress by simple induction. 

— The layering function is consistent for one-swap-equivalent computations. 
This is easily proven because the layer function’s definition is local. 

— Consecutive events whose layer values are not in ascending order can be 
swapped. To prove this, we show that any two such events can only involve 
non-contiguous processors, whose operations are independent by definition. 

To prove computation equivalence using the theorem from conv_comps, we need 
to prove that each nonconvenient computation is equivalent to one with a smaller 
measure. Using the theorem from the layeredjueasure theory, we only need to 
prove that in each non-convenient computation there is an event a that precedes 
an event b where 6’s layer value is strictly smaller than a’s. To prove this we show 
that the layer value of an input operation is bigger than that of any operation 
belonging to a processing “wave” that started with a previous input. 

Note that none of the proof obligations involve the specification (sorting is 
not mentioned, the values sorted are not relevant) and all are local or structural 
in nature. Since the layer measure is appropriate to the structure of the system, 
any difficulty in the proofs is technical, not conceptual. 
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Abstract, We present an approach to verification that combines the strengths of 
model-checking and theorem proving. We use theorem proving to show a bisimulation 
up to stuttering on a — potentially infinite-state — system. Our characterization of stut- 
tering bisimulation allows us to do such proofs by reasoning only about single steps 
of the system. We present an on-the-fly method that extracts the reachable quotient 
structure induced by the bisimulation, if the structure is finite. If our specification is a 
temporal logic formula, we model-check the quotient structure. If our specification is 
a simpler system, we use an equivalence checker to show that the quotient structure 
is stuttering bisimilar to the simpler system. The results obtained on the quotient 
structure lift to the original system, because the quotient, by construction, is refined 
by the original system. 

We demonstrate our methodology by verifying the alternating bit protocol. This pro- 
tocol cannot be directly model-checked because it has an infinite-state space; however, 
using the theorem prover ACL2, we show that the protocol is stuttering bisimilar to 
a small finite-state system, which we model-check. We also show that the alternating 
bit protocol is a refinement of a non-lossy system. 



1 Introduction 

We propose an approach to verification that combines the strengths of the model-checking 
[CE81,QS82,CES86] and the automated theorem proving [BM79,GM93]) approaches. 
We use a theorem prover to reduce an infinite-state (or large finite-state) system to a finite- 
state system, which we then handle using automatic methods. 

The reduction amounts to proving a stuttering bisimulation [BCG88] that preserves pro- 
perties of interest. Two states are stuttering bisimilar if they are equivalent up to next-time 
free CTL* properties [CTL*\X). CTL*\X can be used to state most properties of asynchro- 
nous systems (including fairness) and many timing-independent properties of synchronous 
hardware. Bisimulation — the usual notion of branching-time equivalence — is not appropriate 
when comparing systems at different levels of abstraction because a single step of the ab- 
stract system may correspond to many steps of the concrete system. Weak bisimulation 
[Mil90] allows such comparisons, but does not preserve CTL*\X properties. We introduce 
well-founded equivalence bisimulation (WEB), a characterization of stuttering bisimulation 
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that is based on well-founded bisimulation [Nam97]. A proof that a relation is a WEB invol- 
ves checking that each action of the program preserves the relation. Such single step proofs 
can be checked by theorem provers more readily than proofs based on the original definition 
of stuttering bisimulation. 

A WEB induces a quotient structure that is equivalent (up to stuttering) with the original 
system. The idea is to check the quotient structure, but constructing the quotient structure 
can be difficult because determining if there is a transition between states in the quotient 
structure depends on whether there is a transition between some pair of related states in the 
original system (the number of such pairs may be infinite). Moreover, the quotient structure 
may be infinite-state, but the set of its reachable states may be finite. To address these two 
concerns, we introduce an on-the-fiy algorithm that for a large class of systems automatically 
extracts the quotient structure. Once the quotient structure is extracted, we can model-check 
it or we can use a WEB equivalence checker to compare it with another system. 

We are interested in mechanical verification; by this we mean that every step in the 
proof of correctness (except for meta-theory and mechanical tools) is checked mechanically. 
The theorem prover we use is ACL2 [KM97]. ACL2 is an extended version of the Boyer- 
Moore theorem prover [BM79]. ACL2 is based on a first-order, executable logic of total 
recursive functions with induction. We have implemented a p.- calculus model checker with 
Biichi automata, a WEB equivalence checker, and the quotient extraction algorithm in ACL2; 
this allows us to perform all of the verification in ACL2 (this is possible because ACL2 is 
executable). The ACL2 files used are available upon request from the first author. 

We demonstrate our approach by verifying the alternating bit protocol [BSW69]. We 
chose the alternating bit protocol because it has been used as a benchmark for verifica- 
tion efforts, and since this is the first paper to use WEBs for verifying systems, it makes 
sense to compare our results with existing work. The alternating bit protocol has a simple 
description but lengthy hand proofs of correctness {e.g., [BG94]), it is infinite-state, and 
its specification involves a complex fairness property. We have found it to be surprisingly 
difficult to verify mechanically; many previous papers verify various versions of the proto- 
col {e.g.^ [Mil90,CE81,HS96,BG96,MN95]), but all make simplifying assumptions, either by 
restricting channels to be bounded buffers, by ignoring data, or by ignoring fairness issues. 

In the next section, we discuss notation and present the theoretical background, inclu- 
ding the definitions of WEB, quotient structure, and refinement; related theorems are also 
presented. Due to space limitations, proofs of the theorems are omitted; they will appear in a 
future paper. We assume that the reader is familiar with the temporal logic CTL* [EH86]. In 
Section 3, we present the AGL2 formalization of the alternating bit protocol. In Section 4, 
we present the proof of correctness and in Section 5, we present concluding remarks and 
comparisons to other work. 

2 Theoretical Background 

2.1 Preliminaries 

IN denotes the natural numbers, i.e., {0, 1, . . . Eunction application is denoted by an infix 
dot ‘8’’ and is right associative. {Qx : r : h) denotes a quantified expression, where Q is the 
quantifier, x the dummy, r the range of x (true if omitted), and b the body. ‘‘Such that’’ and 
“with respect to” are abbreviated by “s.t.” and “w.r.t.”, respectively. The cardinality of a set 
S is denoted by |*S|. Eor a relation R, we write sRw instead of {s,w) G R. We write R[S) for 
the image of S under i?, i.e., R[S) = {y : (3x \ x ^ S \ xRy)} and R\a for R left-restricted to 
the set i.e.^ R\ A = {{o^^h) : {aRlj) A (a G A)}. A well-founded structure is a pair (IT, -<) 
where IT is a set and -< is a binary relation on W s.t. there are no infinitely decreasing 
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sequences on W w.r.t. -<• We abbreviate ((s -< w) V (s = w)) hy s w. From highest 
to lowest binding power, we have: parentheses, function application, binary relations 
sBw), equality (=) and membership (g), conjunction (A) and disjunction (V), implication 
(=>-), and finally, binary equivalence (=). Spacing is used to reinforce binding: more space 
indicates lower binding. 

Definition 1 (Transition System) 

A Transition System (TS) is a structure (A, — L, I , AP) , where A is a non-empty set of 
states, — ^ C A X A is the transition relation (which must be left total), AP is the set of 
atomic propositions^ L : S ^ 2^^ is the labeling function which maps each state to the 
subset of atomic propositions that hold at that state, and 1 is the (non-empty) set of initial 
states. We only consider transition systems with countable branching. 

Definition 2 (Well-Founded Equivalence Bisimulation (WEB)) 

B is 3i well-founded equivalence bisimulation on TS M = (A, , L, I , AP) iff: 

1. is an equivalence relation on S; and 

2. (Vs, re G S : sBw : L.s = L.w) \ and 

3. There exists a function, rank : S x S ^ W, s.t. (IT, -<) is well-founded, and 
(ds,u,w G S : sBw A s — ^ u : 

(3v : w V : uBv) V 

{uBw A runkfu^u) -< rankfs^s)) V 

(3v : w V : sBv A rank.(u,v) -< rank.{u,w))) 

We will call a pair {rank, (W, -<)) satisfying condition 3 in the above definition, a well- 
founded witness. Note that to prove a relation is a WEB, reasoning about single steps of 
— ^ suffices, whereas, to prove a stuttering bisimulation, one has to reason about infinite 
paths (the definition of stuttering bisimulation [BCG88] is essentially the same as the above 
definition except for 3, which states that for any s, re s.t. sBw, any infinite path from s can be 
‘‘matched’’ by an infinite path from re.). It is much simpler to use a theorem prover to reason 
about single steps of — ^ than it is to reason about infinite paths; this is the motivation for 
the above definition. 

Theorem 1 (cf. [BCG88,Nam97]) If B is a WEB on TS M andsBw, then for any CTL*\X 
formula f, M, s \= f i M, w \= f. 

For an equivalence relation B on TS M, a quotient structure M/B (read M “mod” B) 
can be defined, whose states are the equivalence classes of B and whose transition relation is 
derived from the transition relation of M. Quotient structures can be much smaller than the 
original: an equivalence relation with finitely many classes induces a finite quotient structure 
(of a possibly infinite-state system). 

Definition 3 (Quotient Structure) 

Let M = {S , — ^ ,L,I , AP) be a TS and let 5 be a WEB on M. The class of state s is 
denoted by [s]. The quotient structure M/B is the TS {S,^,C,T,AP), where: 

1. N={[s] : sgA}; and 

2. C.C — L.s, for some s in (7 (equivalent states have the same label); and 

3. X = {[s] : s G /}; and 

4. The transition relation is given by: For C,D ^ S, C ^ D iE either 
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a) C^D3iiid{3s,w:s^CAw^D:s--^w),oi 
h) C — D and (Vs : s E C : {3w : w E C : s — ^ w)) 

(The case distinction is needed to prevent spurious self loops in the quotient structure, 
arising from stuttering steps in the original structure.) 



Theorem 2 (cf. [Nam97]) If B is a WEB on TS M, then there is a WEB on the union of 
M and M/B that relaies stales from M with their equivalence classes. 



Corollary 1 Eor any CTB\X formula /, M , s |= / t M/B^ [s] |= /. 

2.2 Quotient Extraction 

We define a class of functions which we call ‘Representative’’ functions. As we will see, 
representative functions allow us to extract finite quotient structures automatically. 

Definition 4 (Representative Eunction) 

Let M = (A, — L, I , AP) be a TS and let B he WEB on M, with well-founded witness 
{rank, (IT, -<)). Let rep : S ^ S; then rep is a representative function for M w.r.t. B if for 
all s, re G A: 

1. sBw = rep.s = rep.w; and 

2. rep. rep. s — rep.s; and 

3. rank. (w, rep.s) ^ rank.(w, s); and 

4. rank. [rep.s, rep.s) ^ rank.[s,s) 



Theorem 3 Let rep he a representative function for TS M — {S, L, I , AP) w.r.t. WEB 
B. Let S' = rep[S), and let M' = {S' ,^, L\sf , rep[I), AP) , where s^u i (3v : s — ^ v : 
rep.v = u) . Then M' is M/B, up to a renaming of states. 

Representative functions are very useful (when they exist) because they identify states 
that have all of the branching behavior of their class. They allow one to view the quotient 
structure as a submodel of the original structure, and they are used in the following on-the-fiy 
algorithm for constructing quotient structures. 

Algorithm 1 Quotient Construction 

Given a representative function, rep, for M — {S, — L, I, AP) w.r.t. B, one can construct 
the reachable quotient structure induced by B if rep[T) is finite and computable, and if for 
all s G A, rep[ — ^ (s)) is finite and computable. We start by mapping 1 to rep[T) and then 
explore the state space, e.g., by a breadth first traversal. Given a state, s, in the induced 
quotient structure (recall that s is also a state in the original structure), we compute the set 
rep[ — ^ (s)), which is the set of next states of s in the quotient structure. This process is 
repeated until no new states are generated. If the set of reachable quotient structure states 
is finite, the process will terminate. 

2 . 3 Refinement 

In this section, M = {S,-^,L,I,AP) and M' = {S' , -P , L' , I' , AP') . M and M' are 
isomorphic if there is a bijection f : S ^ S' s.t. s — ^ w iff f.s — P f.w, and f[I) = P . M 
and M' are f3 -isomorphic if they are isomorphic, is a subset of both AP and AP' , and L 
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and U agree when restricted to /?, i.e., for any p G G L.s iff p G L\f.s for all s. We 
say M and are WEB if AP = AP^ and there are WEBs on M and s.t. the quotient 
structures induced are MP-isomorphic. M and are /?-WEB if is a subset of both AP 
and AP' and the structures obtained from M and M' by restricting L and L' to p are WEB. 
If M and M' are MP^-WEB, then we say that M is a refinement of Mb 

Theorem 4 (Refinement) 

1. If M is a refinement of M' , then any CTL*\X formula, that holds in M' holds in M. 

2. If M and M" are fd-isornorphic, M" is a refinement of M' , and AP' is a subset of fd, 
M is a refinement of M' . 

Note that the converse of the first part of the theorem does not hold because AP may be 
a proper superset of AP' . Refinement in a branching- time framework corresponds to refining 
atomicity in such a way that when the variables introduced for the refinement are hidden, 
the resulting system and the original system are WEB. Refinement depends crucially on 
stuttering [LamSO] because we are comparing systems at differing levels of abstraction and 
any reasonable correctness condition will not make assumptions about how long it takes for 
something to happen, i.e., the condition should be stuttering insensitive (i.e., the condition 
will not use X, the next-time temporal operator). 

3 Protocol 

The alternating bit protocol is used to implement reliable communication over faulty chan- 
nels. We present the protocol from the view of the sender and receiver first and then in 
complete detail. The sender interacts with the communication system via the register smsg 
and the flag svalid. The sender can assign a message to smsg provided it is invalid, i.e., 
svalid is false. The receiver interacts with the communication system via the register rm.sg 
and the flag rvalid. The receiver can read rm.sg provided it is valid, i.e., rvalid is not false; 
when read, rm.sg is invalidated. Eigure 1 depicts the protocol from this point of view. 



Sender 




Receiver 



Fig. 1. Protocol from sender’s and receiver’s view 



The communication system consists of the flags sflg and rfig as well as the two lossy, 
unbounded, and ElEO channels s2r and r2s. The idea behind the protocol is that the contents 
of smsg are sent across s2r until an acknowledgment for the message is received on r2s, at 
which point a new message can be transmitted. Similarly, acknowledgments for a received 
message are sent across r2s until a new message is received. In order for the receiving end 
to distinguish between copies of the same message and copies of different messages, each 
message is tagged with sflg before being placed on s2r. When a new message is received, 
rflg is assigned the value of the message tag and gets sent across r2s; this also allows the 
sending end to distinguish acknowledgments. There may be an arbitrary number of copies 
of a message (or an acknowledgment) on the channels, and it turns out that there are at 
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Fig. 2. Alternating Bit Protocol 



most two distinct messages (or acknowledgments) on the channels, hence binary flags suffice. 
Figure 2 depicts the protocol. 

The above discussion is informal; a formal description follows, but first we discuss not- 
ation. We have formalized the protocol and its proof in ACL2, however, for presentation 
purposes we describe the formalization using standard notation. We remain faithful to the 
ACL2 formalization, e.g.^ we do not use types: functions that appear typed are really under- 
specified, but total. The concatenation operator on sequences is denoted by ‘T’, but some- 
times we use juxtaposition; ‘T’’ denotes the empty sequence; head.s is the first element of 
sequence s; tail.s is the sequence resulting from removing the first element from s; |s| is the 
size of the sequence. Messages are pairs; info returns the first component of a message and 
flag returns the second. 

A state is an eight-tuple {sflag, svalid, smsg, s2r, r2s, rflag^ rvalid^ rmsg); state is a predi- 
cate that recognizes states. The sflag of state s is denoted sflag.s and similarly for the other 
fields. Rules are functions from states into states; they are listed in Table 1 and are of the 
form Q ^ A'fii A is used as a rule, it abbreviates true ^ A. Rule Q ^ A defines the function 
(As : if Q.s then A.s else s). We now define the transition relation, R (corresponding to 
in the previous section): sRw iff s is a state and w can be obtained by applying some rule 
to s. 

We have defined the states and transition relation of the alternating bit protocol. The 
states are labeled with an eight-tuple, as mentioned above. It should be clear that we can 
convert this type of labeling into a labeling over atomic propositions (boolean variables) by 
introducing enough — in this case an infinite number of — atomic propositions, therefore, the 
alternating bit protocol defines a TS, ABP. 

4 Protocol Verification 

We give an overview of the verification of the alternating bit protocol. ABP^^ is the alterna- 
ting bit protocol, with some variables distorted. Let be the set of variables that are not 
distorted; then ABP and ABP^^ are /?- isomorphic. We define a relation B and prove that 
R is a WEB on ABP^f We define rep, a representative function on ABP^^ w.r.t. B. We use 
our extraction procedure to extract the structure defined by rep. ABP^ is this structure, 
restricted to j3. We model-check ABP^ \ by Theorem 4, ABP is a refinement of ABP^ and 
any CTL*\X formulae that hold on ABP^ also hold on ABP. 

We also show that ABP^ is WEB to a non- lossy protocol; in many cases such a check is 
more convincing than model- checking because it shows that one system is a refinement of 
another. 

4.1 Well-Founded Equivalence Bisiniulation 

In this subsection we define a relation B and outline the ACL2 proof that R is a WEB. We 
start with some definitions. 
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Table 1. Rules defining the transition relation 



Rule 


Definition 


Skip 


skip 


Accept, m 


-^svalid smsg^svaMd := m, true 


Send-msg 


svalid s2r := s2r : {smsg^ sflag) 


Drop-msg 


s2r e ^ s2r := tail.s2r 


Get-msg 


s2r € A ^rvalid 
If flag.head.s2r = rflag 
then s2r := tail.s2r 

else s2r , rmsg, rvalid, rflag := tail.s2r flmfo. head. s2rflv[ie^ flag. head. s2r 


Send-ack 


r2s := r2s : rflag 


Drop-ack 


r2s e ^ r2s := tail.r2s 


Get-ack 


r2s e ^ 

If head.r2s = sflag 

then r2s^ svalid^ sflag := tail. r 2s sflag 

else r2s := tail. r 2s 


Reply 


rvalid := false 



For the following definitions, a and b are sequences of length 1, a / 5, and x is an arbitrary 
finite sequence. The function compress acts on sequences to remove adjacent duplicates. 
Formally, 

compress. € = e compress. a = a 

compress. aax = compress. ax compress. abx = a : compress. bx 

The predicate good-s2r recognizes sequences that define valid channel contents. Formally, 

good-s2r.e = true good-s2r.ax = (a = {info. a, flag. a)) A good-s2r.x 

The function s2r- state compresses the s2r field of a state, except that already received 
messages at the head of s2r are ignored. Formally, 

s2r -state. s — compress. relevant-s2r.[s2r.s^ {rmsg.s, rflag.s)) 

where the function relevant -s2r is defined by: 

relevant -s2r.{e^ a) = e relevant -s2r.{bx^ a) = bx 

relevant-s2r.[ax^a) = relevant-s2r.[x^a) 

The function r2s -state compresses the v2s field of a state, except that acknowledgements at 
the head of v2s with a flag different from sflag are ignored. Formally, 

r2s-state.s — com^press. relevant -r2s.[r2s.s^ sflag. s) 

where the function relev ant -r 2 s is defined by: 

relevant-r2s.[e^a) = e relev ant - r 2s. [ax ^ a) = ax 

relevant -r2s.{bx^ a) = relev ant - r 2s. [x^ a) 

The main idea behind the bisimulation is to relate states that have similar compressed 
channels — i.e., are equivalent under s2r-state and r2s -state — and are otherwise identical. 
We define the bisimulation in terms of rule 



rep : good-s2r.s2r 



s2r^ r2s := s2r -state ^ r2s -state 
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We now define our proposed WEB B: sBu iff rep.s = rep.u. It is easy to see that B 
is an equivalence relation that, except for s2r and r2s, preserves the labeling of states. We 
define rank^ a function on states as follows: rank.s = |5^r.s| + |r^5.s|. 

We will show that {rank, (IN, <)) is a well-founded witness (to be pedantic we can define 
rank so that it has two arguments, as follows: rank.{u,s) = |5^r.s| + |r^5.s|) Note that if 
sBw, sRu, and sBu, then uBw and by rule Skip, wRw, therefore, we need only concern 
ourselves with the case where -^sBu. To show ^ is a WEB, it suffices to show: 

sBw A sRu A ^sBu => (3v : wRv : uBv V {^sBv A rank.v < rank.w)) 

We break up the proof (that B is 3i WEB) into the eight cases in Table 2 by expanding 
R, i.e., by considering all the ways in which s can be related to u. The cases have the form: 
Rule Lemma; when u or v appear in Lemma they abbreviate the terms Rule.s and Rule. re, 
respectively. We prove the cases in ACL2. 



Table 2, WEB case analysis 



Rule 


Lemma 


Accept. m 


sBw ^ uBv 


Send-msg 


sBw A -^sBu => uBv 


Drop-msg 


sBw A -^sBu => {uBv) V {sBv A rank.v < rank.w) 


Get-msg 


sBw A -^sBu Au ^ Drop-msg. s ^ {uBv) V {sBv A rank.v < rank.w) 


Send-ack 


sBw A -^sBu ^ uBv 


Drop-ack 


sBw A -^sBu ^ {uBv) V {sBv A rank.v < rank.w) 


Get-ack 


sBw A -^sBu Au ^ Drop-ack. s {uBv) V {sBv A rank.v < rank.w) 


Reply 


sBw ^ uBv 



In order to tie up the case analysis, we define a function step that takes three states, s,u, 
and w, as arguments. If sBu, step returns w, else if u = A.s, for A, a rule from Table 1, 
step returns A.w, else step returns w. Since we proved that B is an equivalence relation, 
the following theorem implies that R is a WEB (existential quantification is replaced by the 
witness function step): 

sBw A sRu f\v — step.^s, u, w) wRv A (uBv V [sBv A rank.v < rank.w)) 

4.2 Quotient Extraction 

In this subsection we prove the following ACL2 theorems which show that rep is a repre- 
sentative function satisfying the requirements of Theorem 3; hence, the quotient structure 
induced by rep is isomorphic to the quotient structure w.r.t. B: sBw = rep.s = rep.w, 
rep. rep.s = rep.s, and rank. rep.s < rank.s. We extract the quotient structure (induced by 
rep) of the alternating bit protocol restricted to binary messages. In the following subsec- 
tions, we describe the use of model-checking and WEB equivalence checking to analyze this 
structure. 

We now have enough machinery to describe how refinement is used in the verification of 
the alternating bit protocol. ABP is the model of the alternating bit protocol in ACL2. ABP^^ 
is ABP with s2r, r2s relabeled by s2r- state and r2s -state, respectively. R is a bisimulation on 
ABP^^ with well-founded witness {rank, (IN, <)), s.t. rank.(u,s) = \s2r.f~^.s\ T \r2s.f~^.s\ 
(/ is the bijection between ABP and ABP^^\ recall that rank is defined on states of ABP^^). 
The quotient structure of ABP^^ w.r.t. R is isomorphic to the structure induced by rep. 
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ABP^ is this structure, with s2r and r2s hidden. It is ABP^ that we analyze in the next two 
subsections. By Theorem 4, ABB is a refinement of ABP^ and properties of ABP^ can be 



lifted to ABB, 



4.3 Model-Checking 

We model-check the quotient structure extracted by the above mentioned procedure, using 
a /x-calculus model-checker and a fair-CTT to /r-calculus translator, both written in ACL2. 
We check the following formulae (written in CTL*\X): 

1. AG(sendingl A(sendingl W rrnsg = 1)) 

2. AG[receivingl => A[receivingl W deliveredl)) 

3. AGEFsvalid (acceptance of a new message is always eventually possible) 

where s ending 1 ^ receiving and deliveredl Sive abbreviations for svalid A smsg = 1, 
rvalid A rmsg = 1, and -trvalid A rmsg = 1, respectively; formulae analogous to 1 and 2 
are proved for message 0. All of the above formulae hold on the extracted structure, which is 
what one would expect. The property AG AF svalid (acceptance of a new message is always 
eventually guaranteed), however, does not hold without further fairness assumptions. 

The liveness properties are as follows. Each property is shown under a set of fairness 
assumptions on the actions of the process. These are either weak fairness (infinitely often 
disabled or infinitely often executed) or strong fairness (infinitely often enabled implies 
infinitely often executed). 

1. AG (s ending Newl => A(sendingl U rrnsg = 1)) (sendingNewl represents the 

sending of a new copy of message 1): This holds under weak fairness on the Send-msg 
and Reply actions, and strong fairness on the receipt of a new message by the action 
Get-msg. A similar property holds for message 0. 

2. AG AF svalid: This holds under the fairness assumptions for the previous property, along 
with weak fairness on the Send-ack action and strong fairness on the receipt of a new 
acknowledgment by the action Get-ack. 

Since the fairness conditions mention actions, we compose Biichi automata accepting 
fair paths with the quotient structure and model-check the resulting structure on fair- GET 
formulae which refer both to the propositions of the quotient structure and the accepting 
states of the automata. 

We use an argument based on bisimulation to derive sufficient conditions for data- 
independence [Wol86] of the protocol. These are verified in AGL2; as a consequence, the 
properties shown above for the data domain {0, 1} suffice to show similar properties for 
arbitrary data domains. 

4.4 Bisimulation Checking 

In many cases, the correctness proof is more convincing if we can show that the extracted 
model is bisimilar to a model that is so simple, it is correct by inspection. In the case of 
the alternating bit protocol, we can show that the extracted model is bisimilar to a simple, 
non-lossy version of the protocol, presented in Table 3. 

We use a WEB equivalence checker (based on the description in [BGG88]) written in 
AGL2 to verify that the non-lossy protocol in Table 3 and the extracted protocol are WEB. 
The main idea is that we create the disjoint union of the transition systems corresponding to 
the extracted protocol and the non-lossy protocol. The algorithm will compute the coarsest 
WEB on a structure; hence, if the initial states of the two systems are in the same class. 
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the two systems are WEB. In computing the coarsest WEB, we examine only svalid^ srnsg^ 
rvalid^ and rmsg. Notice that this view is exactly the one presented in Eigure 1. 



Table 3. Rules defining the transition relation of the non-lossy protocol 



Rule 


Definition 


Accept. m 
Send-msg 
Ready 
Reply 


-^svalid srnsg^svalid := m, true 

svalid A^rvalid Absent rvadid^ sent ^ rmsg := true, true, 

sent -n- svalid^ sent := false, false 
rvalid := false 



5 Related Work and Conclusions 

Among related work, [MN95] prove safety properties of the alternating bit protocol by using 
Isabelle/HOL to prove that a manually constructed finite-state system contains all of the 
traces of the alternating bit protocol and then model-check the finite-state system. [HS96] 
show the correctness of an infinite-state system by using PVS to verify that a simple manually 
constructed finite-state system is a conservative approximation of the infinite-state system. 
The work described in this paper improves upon such methods by (i) using a (verified) 
representative function to automatically construct a quotient structure, and (ii) using WEBs 
instead of simulations or trace containment: this allows us to check properties exactly^ i.e., 
if a property holds (fails) on the simple system, then it holds (fails) on the original system. 

There are several known types of infinite- state systems {e.g.^ [ACD90,GS92,AJ96,EN95]) 
for which the model-checking problem is decidable, but these types of systems often turn out 
to be too specialized for many cases where it is possible to devise finite abstractions. There 
have been several approaches to automatically verifying the alternating bit protocol: safety 
properties of such lossy channel systems are decidable [A J96] ; however, in order to construct 
automatic abstractions that demonstrate liveness properties, most other verifications of the 
alternating bit protocol {e.g.^ [GS97]) consider channels to be bounded. 

Mechanical verification is necessary. In our case, we managed to convince ourselves that 
a candidate relation was a WEB for the alternating bit protocol, even though it was not; 
this became clear only when we tried to prove it mechanically. 

An interesting direction for future work is to apply the methodology presented here to 
the verification of other infinite-state systems {e.g.^ pipelined and out-of-order execution 
machines and memory coherence protocols) . 
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Abstract. We describe how three hardware components (two combina- 
tional and one pipelined) for computing the Fast Fourier Transform have 
been proved equivalent using an automatic combination of symbolic si- 
mulation, rewriting techniques, induction and theorem proving. We also 
give some advice on how to verify circuits operating on complex data, 
and present a general purpose proof strategy for equivalence checking 
between combinational and pipelined circuits. 



1 Introduction 

FFT components are a challenge to verify as they compute complex functions in- 
volving many arithmetic operations. Bit-level correctness proofs for such circuits 
are not within the reach of today’s technology; an appropriate level of modelling 
is therefore on the level of individual arithmetic operations on signals carrying 
numerical data. 

In order to make verification techniques industrially interesting, it is gene- 
rally agreed that a high degree of automation is desirable. Unfortunately classi- 
cal automatic methods such as propositional logic tautology checking or model 
checking can not be immediately applied at this level of abstraction. Different 
extensions of model checking with uninterpreted functions encoded in BDDs 
have been proposed [VB98]; we instead use theorem proving, but in such a way 
that no user guidance is needed during the proofs. 

As we aim for verification at the arithmetic level, it is imperative to structure 
the proofs to be as simple as possible; we therefore devise heuristics for the 
particular class of circuits we verify and apply automatic analyses that aim to 
reduce the work that has to be done in the theorem prover. For this end we use 
the Lava hardware development platform that has a powerful language in which 
we can implement our analyses and write parametrisable scripts that control 
complex theorem prover interactions [BCSS98]. 

The work described is an industrial case study with Ericsson Cadlab, Stock- 
holm. 

2 The Lava Hardware Development Platform 

Lava is a hardware description language and a framework for hardware verifica- 
tion developed at Chalmers and Xilinx [BCSS98]. One of the principal uses of 
Lava is as a platform for hardware verification experiments. 
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Lava is embedded in the functional language Haskell; all aspects of the deve- 
lopment of hardware from descriptions down to the interfacing to layout tools are 
expressed in the same language. The use of a polymorphic high level language 
that supports higher order functions gives very concise hardware descriptions 
and allows us to devise combinators that capture common design patterns. 

The circuit descriptions can be interpreted by symbolic evaluation in a num- 
ber of different ways; examples of built in standard analyses are circuit simu- 
lation, generation of logical formulas in formats suitable for external theorem 
provers and generation of VHDL. The verification interpretation is parametrised 
over the proof procedure and allows the passing of optional proof parameters; a 
user can therefore quickly retarget from one proof procedure to another without 
losing fine grain control. 



3 The Fast Fourier Transforms 

The Fast Fourier Transforms (FFTs) are efficient algorithms for computing a 
length N sequence of complex numbers X given an initial sequence x and a 
constant defined as 

A^-l 

X{k) = x{n) • k e {0... A-1} 

n=0 

The FFTs exploit symmetries in the twiddle factors together with restric- 
tions of sequence lengths (for example to powers of two) to reduce the number 
of necessary computations. Examples of twiddle factor laws that express useful 
symmetries are 



n = i 

w^=l 

+ ^ 

n n n 

w!: = wi!:, {n,k<N) 

The FFT algorithms are often implemented in combinational hardware, and 
are key building blocks in signal processing applications; the FFTs are rumoured 
to be the worlds most implemented algorithms in hardware. 

The reference FFT is the decimation in time Radix-2 algorithm, which ope- 
rates on input sequences whose length is a power of two [PM92]. If the input 
length also is a power of four, the decimation in frequency Radix- 2^ FFT can 
be applied [He95]. From a designer’s point of view the question is whether the 
combinational circuits that implement these algorithms are equivalent. As the 
networks are fundamentally different, verification of equivalence is a non-trivial 
undertaking. 

Combinational implementations are not the only ones possible; pipelined se- 
quential designs can use less circuit area by trading space for time. A pipelined 
implementation of a size 2^ Radix-2^ FFT (see figure 1) consists of two simple 




382 



P. Bjesse 



BFl BF2 BFl BF2 BFl BF2 




Fig. 1. Structure of pipelined implementation of a size 64 Radix-2^ PPT 



kinds of combinational components (Cl and C2) that together form a stage; a 
whole circuit consists of n/2 stages. Each primitive block is controlled by syn- 
chronisation signals generated by an n-bit counter. This counter also addresses 
a multi port memory that outputs streams of twiddle factors that are multiplied 
together with the outputs of each stage. 

Figure 2 shows how the pipelined EFT circuit simulates the corresponding 
combinational circuit over time by reading the inputs in the first sequence of 
input values ZF(0) while spitting out undefined outputs until time lag (2^ — 1 
for a size 2^ EFT) when the first element of the output sequence OF{0) is 
generated; the lag time is always constant. At the same time as the outputs 
are produced, inputs from a new input sequence are read so that the circuit 
continuously processes data. 




0 lag lag+l*2^ lag+2*2^ 



Fig. 2. Operation of the pipelined circuit 
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4 FFT Low-Level Descriptions 

The FFT descriptions are parameterised by the circuit size and are formulated 
using a number of simple circuits and combinators that are useful for signal 
processing applications. 

A key point is that the regularity of the combinational networks makes the 
circuits very easy to describe in Lava; the description of the Radix-2 FFT in 
terms of the signal processing combinators is just 3 lines long (see appendix A). 

The Lava circuit descriptions can be used to automatically generate structu- 
ral VHDL for all parts of the implementations with the exception of the multi 
port memory component. 



5 Verification of Components 

As we want automatic proofs, we will only be concerned with equivalence check- 
ing for fixed size circuits. We will also exploit designer knowledge and use Lava 
analyses in order to make the proofs tractable for the external proof procedure. 
The circuits are modelled on the level of operations on infinite precision com- 
plex numbers; this modelling is appropriate as finite representations of complex 
numbers only can be used for approximate calculation of the FFT. A reasona- 
ble notion of implementation equivalence must therefore be defined in terms of 
infinite precision complex arithmetic. 

As a shorthand, we adopt the convention that 

F{x,y) = F{x{0)..x{i-l),y{0)..y{i-l)) 

if F G Form (the set of first order logic formulas) and G where S is any 
non-empty set. 

5.1 Theoretical Basis of the Verifications 

Combinational circuits can be viewed as functions / from input to output. Lava’s 
symbolic evaluation can generate formulas 6f that define the functions we are 
concerned with in the sense that T h ^ /(/) = O if T is a theory 

containing theorems that are true in a standard interpretation of complex arith- 
metic. 

The formulas that are constructed in the following verifications are expressed 
in first order logic with equality, and contain variables and two-place function 
symbols plus^ sub^ Urn and W. The circuit equivalence checking problem is 
reduced to showing that certain formulas that capture implementation equiva- 
lence are members of the theory T which we give axioms for. The axioms are 
well-known properties of complex arithmetic and some twiddle factor identities. 
We know that the axioms hold in the interpretation J that complies with the 
following conditions 

— The domain is the set of complex numbers 
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— plus designates complex addition 

— sub designates complex subtraction 

— Urn designates complex multiplication 

— W designates the function fw{k,N) = 

All formulas that are derivable from the axioms in a sound proof system are 
therefore also true in 3, 

5.2 Combinational FFT Verification 

Are the abstract implementations of the Radix-2 and the Radix-2^ FFT equiva- 
lent for sizes that are an exponent of four? 

The fixed size FFT circuits are functions Fi{I) and ^2(7) from complex input 
sequences to complex output sequences. Lava’s symbolic evaluation can generate 
formulas 6i and 62 that define these functions. Our criterion for equivalence of 
the combinational FFT is that 

A 42 ( 7 , 02 ) ^ Oi = 02 

Instead of generating the two defining formulas individually and then combi- 
ning them together to a resulting formula, we can construct a test bench circuit 
that directly generates the correctness formula when interpreted symbolically: 

fftSame n = 

do inp <- newCmplxVector (4^n) 
outl <- radix2 (2*n) inp 
out2 <- radix22 n inp 
equals (outl,out2) 

The test bench builds a vector of unrestricted complex variables, which are given 
to both FFT implementations. The resulting output sequences are then point- 
wise compared to each other for equality. If the formula describing this system 
is derivable by the theorem prover using the axioms for the theory then it is 
true in the model 3 and the implementations are equivalent. 

Lava’s verification interpretation takes a test bench circuit and a proof pro- 
cedure with some arguments, and automatically generates formulas and runs the 
proof. The manual step that has to be taken is to choose a prover and possibly 
give proof options. In this case, we have to choose a first order logic theorem 
prover, and specify some axioms. These include some simple algebraic laws for 
the arithmetic operators, such as distributivity of multiplication over addition 
and that 1 is a unit element for multiplication. The twiddle factor identities from 
section 3 are also necessary. 

Although these axioms with any first order logic prover are in theory sufficient 
to prove the circuits equivalent, the number of consequences grows very quickly 
if the rules are applied mindlessly. This combined with the fact that the FFT 
circuits generate formulas that for larger sizes grow to be megabytes big means 
that we must give extra proof options in order to make the proofs tractable. 
Symbolic evaluation of the FFTs for 4 abstract inputs reveals some interesting 
circuit properties (the input and output vectors are indexed backwards): 
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Lava> symbol ic_eval (radix2 2) 
[(x3 - W(2, 0) * xl) - W(4, 1) * 
(W(2, 0) * xl + x3) - W(4, 0) * 
W(4, 1) * (x2 - W(2, 0) * xO) + 
W(4, 0) * (W(2, 0) * xO + x2) + 

] 



(x2 - W(2, 0) * xO), 
(W(2, 0) * xO + x2), 
(x3 - W(2, 0) * xl), 
(W(2, 0) * xl + x3) 



Lava> symbolic_eval (radix22 1) 



[W(4, 


0) 


* 


((x3 


- xl) 


W(4, 


0) 


* 


((xl 


+ x3) 


W(4, 


0) 


* 


(W(4, 


1) * 


W(4, 


0) 


* 


((xO 


+ x2) 



] 



- W(4, 1) * (x2 - xO)), 

- (xO + x2) ) , 

(x2 - xO) + (x3 - xl)), 
+ (xl + x3)) 



The lack of control logic in the combinational FFT components causes the circuit 
outputs to be polynomials in the inputs and twiddle factors only. Rewriting of 
the expressions by simplifying away twiddle factors that are equal to 0 or 
conversion of the remaining twiddle factors to the form and restructuring 
of arithmetic expressions to sum of products form makes it possible to show the 
two results equal by syntactic equality alone. 

The rewriting has to be done in a particular way for it to be applicable to the 
larger circuits. If the axioms are given as standard equalities, they can be used 
in both directions. This is not how the most efficient proof would proceed, as it 
suffices to use all the rules in one direction only: expand out the polynomials, 
take away trivial twiddle factors and rewrite the others. 

Unidirectional rules are therefore more suitable for our purposes. The theo- 
rem prover Otter has efficient such rules that are called demodulators [MW97]; 
the use of a demodulation rule can be unconditional or restricted by predicates 
on terms. An important property of these rules is that they are used as often as 
possible without accumulating intermediate results. This reduces the number of 
consequences and makes normalisation of large expressions tractable. 

The demodulation proof rules are specified inside Lava and passed to Otter as 
two theories. The actual proofs are done by calling the verification interpretation 
on the test bench and the proof configuration: 



options = [Prover otter. Theory arithmetic. Theory (twiddle 4)] 

Lava> verify options (fftSame 1) 

Valid 

In this way the equivalence of circuits up to size 256 is proven automatically. 
Statistics for the resulting proofs and some system formula measures such as the 
number of primitive logical and arithmetic operations are given in table 1. The 
running times are measured on a 300 MHz Sun Enterprise 450. 
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Table 1. Statistics for verification of equivalence between combinational FFTs 



FFT size 


Verification time (s) 


Formula size (Bytes) 


^ of variables 


^ of formula operations 


4 


0.09 


1179 


33 


59 


16 


0.39 


10 761 


233 


433 


64 


10.31 


172 088 


1334 


2529 


256 


827 


2 886 561 


6939 


13 313 



5.3 Pipelined FFT Verification 

We would now like to verify that the sequential pipelined implementation of the 
Radix-2^ is equivalent to the combinational circuit. We employ a strategy that is 
optimised for equivalence checking of combinational and constant delay (“lag”) 
pipelined circuits. 

The presentation is divided into two parts: The first part describes the stra- 
tegy and the second demonstrates how it applies to the particular case of our 
FFT verification. 



A strategy for pipeline equivalence proofs If we observe the pipelined 
circuit for a single clock period, it is a function from a starting state S and input 
/ to a finishing state S' and a resulting output O. 

{0,S')=ppl{I,S) 

We use the term “frame” to refer to a complete in- or output data sequence 
for the combinational or pipelined circuit. Lava can generate a defining formula 
S^ O, S') for thepp/(J, S) transition function that captures how the circuit 
behaves over a single clock tick. The objective is to show equivalence between 
the two implementations for any number of successive frames starting from a 
(partially) specified initial state, using the following verification strategy which 
we refer to as Equiv^^^: 

1. Generate the defining formula 6ppi[I^ S', O, S') of the pipelined circuit. 

2. Define I to be the number of inputs that the pipelined circuit has to consume 
before it can read the first input of the second frame. 

3. Define m as the least number of time steps that the pipelined circuit has to 
run to allow an observer to deduce that the output from the sequential circuit 
matches a single frame of output from the combinational implementation. 

4. Let k = max(/,m). 

5. Let be the following formula that expresses what behaviour a length k 
trace of the sequential circuit exhibits 



6ppi{Io,So,Oo,Si)A6ppi{IuSuOuS2)A...A6ppi{h.uSk.uOk.uSk) 

This is the A:-step unrolling of the pipelined transition function. 

We refer to a trace that is a model for as o. T trace, and observe the 
following: 
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— If we define an initialisation state as a state that immediately precedes 
the processing of a new frame, both Sq and Si are initialisation states 
on all T traces. Furthermore, Si is the closest initialisation state to Sq. 

— Any infinite trace of the system is made up from infinitely many conca- 
tenated T traces; given that I < k successive traces tr^ and also 

overlap with tr^{l . . . A: — 1) = . . . k — l — 1), 

6. Generate a defining formula for the combinational circuit, 6cmb{I: ^)* 

7. From and construct a formula A that expresses implementation 

equivalence for a single frame of inputs 

8. A proof of A without any assumptions at all on the initialisation state Sq 
implies VAq.A. This corresponds to equivalence for any number of time frames 
as the circuits will behave in the same way regardless of the initialisation 
state values before a new frame is processed; a direct proof of A is hence not 
realistic. Therefore strengthen the assumptions on Sq by a formula (j) that 
restricts some of the So variables to the initial values given in the pipelined 
circuit description. If now 

HSo) ^ A 

is provable, the circuits are equivalent for any number of time frames under 
the assumption that (j) is always true in initialisation states. Refer to this 
assumption as assumption A 

9. Try to prove assumption A valid by a proof of 

^{So)AX ^ ^{Si) 

As (j) holds in the initial state of the circuit, this formula implies A as it asserts 
that ^ will hold in the state Si (that is reached immediately before a new 
processing cycle is initiated) if (j) is true in Sq (that was reached immediately 
before this frame was processed); A is therefore entailed by induction. 

10. If step 8 and step 9 were successful, deduce multi frame equivalence 

A valid question is, of course, “Why is it reasonable to assume that a part of 
the pipelined circuit always is in a state where (j) holds before a new frame is 
read?”. This is probable as the pipelined circuit is supposed to repeat the frame 
processing behaviour again; the registers in the control logic should therefore 
have similar contents in the initialisation states as in the specified initial circuit 
state. 

By having reduced the problem to two simple proofs we have devised a simple 
strategy for showing pipelined circuits with a fixed lag equivalent to combinatio- 
nal implementations. This strategy is implemented in an automatic Lava proof 
script that is parameterised over circuit descriptions, frame length, the constant 
lag and a proof configuration for the frame equivalence proof. This script auto- 
matically generates and reduces all formulas as much as possible before calling 
the theorem prover specified in the proof configuration; the only manual steps 
are to choose which state variables to restrict and to select a proof procedure. 
Any prover and extra proof options can be specified in the proof configuration; 
the pipelined circuit description can also have as many or as few initial values 
given as desired. 
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Application to the pipelined Radix- 2^ FFT The script that implements 
EquiV(^ proves pipeline equivalence for the FFT circuits with the automatically 
generated equivalence formula A defined as 

So..Sk,Oo..Ok-i)AScmb{^0‘‘h-i,OQ..O^^_i) — ^ Oiag^^Ok-i = Oq..O[_i 

where lag = 2^ — 1, i = 2^ and A: = 2^ + lag. 

A sufficient restriction (j) on the initial state of the pipelined FFT circuit is 
that the n-bit counter is initialised to 0. The reason why this simple assertion 
is strong enough to prove the FFT implementations equivalent is that at re- 
initialisation the rest of the pipeline state is unimportant, new values have to be 
read for processing anyway. This is likely to hold for most pipelined implemen- 
tations of combinational circuits. 

The initialisation information (j) is always used by the Lava script to reduce 
the generated formulas as much as possible while they are produced. This reduc- 
tion computes the values of logical expressions whenever possible and propagates 
the resulting new information. As a consequence, the formulas that specify the 
behaviour of the control logic inside the pipelined FFT are evaluated away and 
the re- initialisation invariant in step 9 of Equiv^ is proved by syntactic equality. 
The equivalence checking problem for the pipelined FFT is therefore reduced 
back to a proof of an equivalence formula that turns out to be amenable to nor- 
malisation with the theories used for the combinational equivalence checking. 
The complexity of the resulting proofs are indicated in table 2. 



Table 2. Statistics for verification of pipelined equivalence 



FFT size 


Verification time (s) 


Formula size (Bytes) 


4 


0.05 


1227 


16 


0.61 


10 045 


64 


22.26 


162 862 


256 


1361 


2 797 617 



5.4 Manual Prepsiration 

Approximately two weeks was spent on studying the FFT implementations, devi- 
sing signal processing combinators and writing circuit descriptions. The addition 
of support in Lava’s interpretations for complex numbers and the writing of the 
symbolic simulation interpretation with automatic formula reduction took one 
week of work each. 

Finding the proof procedure was the creative step for the combinational FFT 
verification. Two other theorem provers, Prover [Sta89] and Gandalf [Tam97], 
was tried before Otter. Prover lacked crucial arithmetic laws, and Gandalf did 
not support the unidirectional rules that were needed to make the proofs scale up. 
A correct set of rewrite rules took some hours work by two users, Koen Glaessen 
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and Tanel Tammet, who were unfamiliar with the FFT but knew Otter well. 
Any other applicable proof procedure would also have needed rewrite rules for 
the twiddle factors, so we believe that this degree of manual work is unavoidable. 

Once the symbolic simulation interpretation with formula reduction was writ- 
ten, a first (more involved) pipeline proof script could be constructed in half an 
hour. This strategy was successful the first time it was tried; we later simplified 
the heuristic to the presented form. The only non-reusable steps of the combina- 
tional and pipelined verifications were to choose Otter with rewrite rules as the 
proof procedure and to restrict the synchronisation counter state to the initial 
state 0. 



6 Lessons Learned 

The FFT circuits are representatives for a general class of circuits that compute 
complex functions without using a large amount of boolean control logic. In 
general, a few guidelines for proofs of circuit equivalence for such circuits can be 
drawn out of the FFT work: 

— For each problem domain, it might be possible to find a small number of 
generalised proof scripts that can be powerful enough for a particular class 
of problems to make proofs automatic in most cases. These scripts should 
be parametrisable by proof options so that they not are too blunt to be 
reusable. 

— As the proofs that have to be done when operations like arithmetic are 
involved are relatively complex, the prover’s job must be simplified as much 
as possible. The use of automatic partial evaluation and formula reduction 
can in some cases lessen the need for prover inferences drastically. A tool like 
Lava that supports analyses like simplification of formulas by propositional 
reasoning and cone-of-influence analysis can help the designer simplify the 
problem at hand. 

— It is not always necessary to explore the state space of a design. Ordinary 
induction can sometimes avoid very complex or intractable computations, 
and make for uncomplicated proofs. 

— Normal form rewriting is a powerful technique that can be implemented very 
efficiently using modern rewrite engines. However, the use of unidirectional 
rules is crucial to make the strategy applicable to larger circuits. 



7 Related Work 

The Radix-2 FFT algorithm has previously been verified against the DFT using 
the ACL2 theorem prover [Gam98]. The level of abstraction in this verification 
was high and the proof thus required substantial user interaction. In contrast, 
we have aimed for fully automatic proofs, and verified the hardware FFTs at 
the netlist level. Our proofs are only for equivalence of fixed size circuits, but 
are not reliant on circuit regularity. 
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The pipeline proof principle bears some resemblance to the refinement map- 
ping approach to pipelined microprocessor control verification [BD94,Cyr93]. 
However, as we are comparing a pipelined circuit against a combinational one, 
we cannot directly associate a single sequential step with the combinational im- 
plementation; we instead correlate whole frames. We also exploit the fact that 
constant lag pipelined circuits are targeted. 

There are alternatives to Otter as a proof procedure: the Stanford validity 
checker decides quantifier free first order logic with linear arithmetic and uninter- 
preted functions by boolean case splitting (backtracking), rewrites and congru- 
ence closure [BDL96]. SVC has been used extensively in hardware verification, 
and is used as the decision procedure in the Burch and Dill approach to micro- 
processor verification [BD94]. Multiway decision graphs are a variation on the 
ROBDD theme that accommodates abstract data types, uninterpreted function 
symbols and rewrite rules [ZSC+95]; this data structure has been used to ve- 
rify non-pipelined microprocessors and an ATM switch [TZS+96]. MDGs give 
a canonical representation for a fragment of quantifier free first-order formulas 
and support exploration of abstract state spaces (but do not guarantee con- 
vergence of fixpoint computations). As we have demonstrated, it is not always 
necessary to do such expensive computations; induction and normalising can be 
both sufficient and efficient. 

Both MDGs and SVG need the user to provide rewrite rules or a normaliser 
for new theories. This means that the manual step of finding a normal form for 
twiddle factors is also necessary with these proof procedures. 

8 Conclusions 

This paper has shown how some TFT circuits have been verified from within the 
hardware development tool Lava after the existing system was extended with 
complex numbers and a general purpose strategy for equivalence checking of 
combinational and fixed lag pipelined circuits. The verification has been auto- 
matic in the sense that the only manual proof steps has been to select the proof 
procedure, rewrite rules and the initial state variables to restrict. The proofs are 
at a relatively low level, which should give a high confidence in the correctness 
of the modelled circuits; the logical formulas has been generated by symbolic 
evaluation of the hardware descriptions. No part of the verification has relied on 
the specific way that the arithmetic operators are implemented, or the represen- 
tation of complex numbers. However, the proofs are not general in the size of 
the FFT; different instances have to be proved separately. 

We have also presented an induction principle that exploits the problem 
structure of equivalence checking between a pipelined circuit and a combinational 
reference circuit, and contributed some suggestions for verification of circuits 
that contain little control logic but do complicated computations expressed in 
abstract operations. 
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9 Future Work 

Lava is optimised for developing and verifying hardware. We pay for the strength 
we gain by limiting the problem domain, however, by presently being unable to 
reason internally about the proof strategies. Instead we have to go outside the 
system to a general purpose interactive theorem prover and do high level proofs 
there. We would like to have Lava integrated with a proof system that would 
allow us to do this kind of reasoning. 

The counter examples that are produced by proof procedures are formatted 
and passed back to the user by Lava; unfortunately many first order logic theo- 
rem provers (including Otter) lack such capabilities. For verification with normal 
form rewriting to be smooth, it must be easy to find a rewriting theory quickly. 
It is therefore imperative to have some tool that analyses the output of a failed 
proof and allows the user to deduce what rules are missing, or gives the user 
good clues to why the two formulas are not equivalent. This is something that 
should (and will) be implemented in Lava as a proof analysis. 
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A Appendix 

A.l The Radix-2 FFT Description 

Figure 3 shows a size 16 Radix-2 FFT network, where merging arrows indicate 
addition and constants under a wire indicate multiplication. The Lava descrip- 
tion of the size 2^ Radix- 2 FFT circuit follows the network structure closely, 
and is parametrised by n: 

radix2 n = 

bitRev n >-> compose [ stage i I i <- [1. .n] ] 
where 

stage i = raised (n-i) two (twid i >-> bflys (i-1)) 
twid i = one (decmap (2^ (i-1)) (wMult (2^i))) 

The FFT circuit is made up from the sequential composition of an initial bit 
reversal permutation network (not shown in the picture) and n circuit stages. 
Stage i is a column of 2^“^ components that each contains a twiddle factor 
multiplication stage sequentially composed with a butterfly network. Given that 
X = 2^“^, a size i multiplication stage performs multiplications with 
on the respective wires of one half of a bus, while passing the other half through 
unchanged. 

More information on the signal processing building blocks and the descripti- 
ons of the combinational circuits can be found in [BCSS98]. 
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Fig. 3. The structure of a size 16 Radix-2 FFT 
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Abstract. We present a new algorithm for detecting semantic combi- 
national cycles that is simpler and more efficient than earlier algorithms 
found in the literature. Combinational circuits with syntactic cycles often 
arise in processor and bus-based designs. The intention is that external 
inputs and delay elements such as latches break these cycles, so that 
no “semantic” cycles remain. Unbroken semantic cycles are considered a 
design error in this context. Such unbroken cycles may also occur inad- 
vertently in compositions of Mealy machines. 

Verification systems that accept semantically cyclic definitions run the 
risk of certifying systems that have electrically bad or unexpected be- 
havior, while those that prohibit all cyclic definitions constrain the types 
of systems that can be subjected to formal verification. Earlier work on 
this issue has led to a reasonable condition, called Constructivity^ that 
guarantees the absence of semantic cycles. This formulation is, however, 
computational in nature, and existing algorithms to decide construc- 
tivity are somewhat inefficient. Moreover, they do not apply naturally 
to circuit definitions in high-level languages that allow variables with 
non-Boolean types. We propose a new formulation of construct ivity, for- 
mulated as a satisfiability question, that does not have these limita- 
tions. We have implemented the new algorithm in the verification tool 
COSPAN/FormalCheck. Our experience indicates that the algorithm is 
simple to implement and usually incurs negligible overhead. 



1 Introduction 

A circuit may be described as a set of definitions, one for each gate of the circuit. 
For most circuits, the induced syntactic dependency graph of such a definition is 
acyclic. Syntactically cyclic definitions, however, occur in many contexts in digi- 
tal design: Malik [9] points out that it is often desirable to re-use functional units 
by connecting them in a cyclic fashion through a routing mechanism, and Stok 
[13] notes that such definitions often arise in the output of synthesis programs. 
In these cases, the intention is that the routing mechanism can be controlled 
through external inputs, so that any “semantically” cyclic paths are broken for 
each valuation of the external “free” inputs and delay elements such as latches. 
Semantically cyclic definitions may also occur inadvertently in systems composed 
of several Mealy machines, from feedback connections between the combinational 
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inputs and outputs. Verification systems that accept semantically cyclic defini- 
tions run the risk of certifying systems that have behavior that is unexpected 
or electrically bad, while those that prohibit syntactically cyclic definitions con- 
strain the types of systems that can be subjected to formal verification. 

Most current design and verification systems either prohibit all syntactically 
cyclic definitions, or accept only some of the semantically acyclic definitions. The 
Esterel compiler is the only existing system we know of that analyzes definitions 
for semantic cyclicity using the notion of “ Constructivit^^ proposed by Berry [2] , 
which considers a circuit to be semantically acyclic iff for every external input, 
a unique value can be derived for each internal wire by a series of inferences on 
the definition of the circuit (a precise statement is given in Section 2). Shiple 
[11] shows that constructive definitions are precisely those that are well-behaved 
electrically, for any assignment of delay values, in the up-bounded inertial delay 
model [4]. 

It is inefficient to check construct ivity by enumerating all possible external 
valuations. Symbolic algorithms for checking constructivity [2,12,11] manipulate 
sets of input valuations, representing them with BDD’s [3]. This manipulation 
is based on simultaneous fixpoint equations derived from the circuit definitions 
and the types of the variables. For variables with k values in their type, these 
algorithms require k sets of valuations for each variable. Moreover, for arithmetic 
operations, the fixpoint equations are constructed from partitions (for +) or 
factorizations (for *) of all numbers in the type. Thus, these algorithms are 
somewhat inefficient and difficult to implement for variables with non-Boolean 
types. 

We show in this paper that, by a simple transformation, one can reformulate 
constructivity as the satisfiability of a set of equations derived from the defi- 
nitions, over variable types extended with a value _L (read as “bottom”). This 
formulation is non-computational and easily extensible to variables with any fi- 
nite type. The formulation also handles definitions of indexed variables in the 
same manner. We have implemented this constructivity check in the verification 
tool COSPAN [7], which is the verification engine for the commercial verification 
tool FormalCheck; the implementation is simple, and our experience indicates 
that it usually incurs negligible overhead. 

Section 2 motivates and precisely defines constructivity. The new formula- 
tion is derived in Section 3. Section 4 describes the implementation of this idea 
in the COSPAN /FormalCheck verification system. The paper concludes with a 
discussion of related work and future directions in Section 5. 

2 Cyclic Definitions 

Notation: The notation generally follows the style in [6]. Function application 
is represented with a and is right-associative; for instance, f.g.a is parsed 
as f.{g.a). Quantified expressions and those involving associative operators are 
written in the format (Q x : r.x : g.x)^ where Q is either a quantifier (e.g., V, 3) or 
an associative operator (e.g., + , *, mm, max, luh^ glh)^ x is the “dummy” variable. 
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r.x is the range of x, and g.x is the expression. For instance, (Vx r{x) ^ g{x)) 
is expressed as (Vx : r.x : g.x)^ {3x r{x) A g{x)) is expressed as {3x : r.x : g.x)^ 
and is expressed as (+i : i G [0,n] : x.i). When the range r is true 

or understood from the context, we drop it and write {Q x :: g.x). Proofs are 
presented as a chain of equivalences or implications, with a hint for each link of 
the chain. □ 

For simplicity, we consider all variables to be defined over a single finite type 
T. The vocabulary of operator symbols is given by a finite set F. Each symbol 
in F has an associated “arity”, which is a natural number. A symbol / with 
arity n corresponds to a function f* : ^ T; symbols with arity 0 correspond 

to values of T. Terms over F and a set of variables X are built as follows : a 
variable x in X is a term, and for terms t.i {i G [0,n)) and a function symbol / 
of arity n, /.(AO, . . . , A(n — 1)) is a term. 

Definition 0 (Simultaneous definition). A simultaneous definition is spec- 
ified hy a triple (E,X, where X and Y are disjoint finite sets of variables^ 
E is a set of expressions of the form y ::= t^ where y ^ Y and t is a term in 
X yj Y ^ such that there is exactly one expression in E for each variable in Y . 

In terms of the earlier informal description of a circuit as a set of defini- 
tions, X is the set of “external” variables (the free inputs and latches) and 
Y is the set of “internal” variables (the internal gate outputs); notice that a 
simultaneous definition contains definitions only for the internal variables. A si- 
multaneous definition induces a dependency relation among the variables in Y ; 
for each expression y ::= t^ y “depends on” each of the variables appearing in 
t. A simultaneous definition is syntactically cyclic iff this dependency relation 
contains a cycle. We illustrate some of the subtleties in formulating a correct 
notion of semantic acyclicity with a few examples. 

Example 0 : Syntactic Acyclicity 

The external variable set is {x, y} and the internal variable set is {p, q]. 

p ::= X A -1^ 
q ::= X V y 

This is syntactically acyclic; hence, for every valuation of x and p, p and q have 
uniquely defined values. □ 

Example 1 : Syntactic Cyclicity, Semantic Acyclicity 

The external variable set is {x, y} and the internal variable set is {p, q}. 

p ::= if X then y else q 
q ::= if x then p else x 

This is syntactically cyclic; however, notice that if x is true^ the definition sim- 
plifies to the acyclic definition: 

p ::= y 
q ::= p 
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Similarly, the simplified definition is acyclic when x is false. Thus, each setting 
of the external variable x breaks syntactic cycles. □ 

Example 2 : Semantic Cyclicity 

The external variable set is {x} and the internal variable set is {p^q}. 

p ::= q A X 
q ::= p 

This is syntactically cyclic. If x is false ^ the simplified definition is acyclic; how- 
ever, when X is true^ it simplifies to one that presents a semantic cycle: 

p ::= q 
q ::= p 

□ 



A plausible semantics for a simultaneous definition is to interpret each expres- 
sion y ::= t as an equation y = and declare the definition to be semantically 
acyclic if this set of simultaneous equations has a solution for each valuation 
of the external variables. With this semantics. Examples 0 and 1 are semanti- 
cally acyclic, but so is Example 2. One may attempt to rectify this situation by 
requiring there to be a unique solution for each input valuation; the following 
example illustrates that this is also incorrect. 

Example 3: Incorrectness of the “unique solution” criterion. 

The external variable set is {x} and the internal variable set is {p, g}. 

p ::= p A X 

q ::= if y then -ig else false 

This is syntactically cyclic. If x is false ^ the simplified definition is acyclic, and 
hence has a unique solution. If x is true^ the simplified definition is the following. 

p ::= p 

q ::= if p then ->q else false 

This has the unique solution p = false^q = false. Hence, the definition has a 
unique solution for each valuation of x ! The “unique solution” criterion thus 
leaves the cycles p ::= p^ q ::= undetected. □ 

The examples suggest that a straightforward formulation in terms of solu- 
tions to the simultaneous equations may not exist. Berry [2], strengthening a 
formulation of Malik [9], proposed a condition called Constructivity. Construc- 
tivity is based on the simplification process that was carried out informally in 
the examples above : for each valuation of the external variables, one attempts to 
simplify the right hand sides of the definitions. If a term t in a definition y ::= t 
simplifies to a constant a, the current valuation is extended with y = and the 
definition y ::= t is removed. The simplifications are restricted to cases where 
the result is defined by the current valuation irrespective of the values of vari- 
ables that are currently undefined. Eor instance, with {x = false} as the current 
valuation, if x then y else z simplifies to z] x A y simplifies to false; but y \/ ->y 
does not simplify to true. Berry [2] shows that this process produces a unique 
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result, independent of the order in which simplification steps are applied. The 
appropriateness of constructivity is shown by Shiple [11], who demonstrates that 
constructive definitions are precisely those that are well-behaved electrically, for 
any assignment of delay values, in the up-bounded inertial delay model [4]. Malik 
[9] shows that the problem of detecting semantic cyclicity is NP-complete. 

Definition 1 (Constructivity). A simultaneous definition is semantically 
acyclic iff for each valuation of the external variables^ the simplification process 
leads to an empty set of definitions, 

3 Constructivity as Satisfiability 

There is another way of viewing the simplification process that leads to our 
new formulation. Simplification is seen as a fixpoint process that computes the 
“maximal” extension of the original valuation of external variables (maximal 
in the sense that the set of definitions cannot be simplified further with this 
valuation). The algorithms for checking constructivity proposed in [9,12] use 
this fixpoint formulation. We show (Theorem 1 below) that it is possible to re- 
cast the fixpoint formulation as a satisfiability question. This observation lets us 
develop a simple algorithm for constructivity that extends easily to non-Boolean 
types. 

3.1 Background 

To formulate simplification as a fixpoint process, we need some well-known con- 
cepts from Scott’s theory of Complete Partial Orders (CPO’s) [10]. The type T 
is extended with a new element _L (read as “bottom” ) to form the type 
is equipped with the partial order defined bya^6iffa = 6ora = _L. Note 
that ^ is a CPO (every sequence of elements that is monotonically increasing 
w.r.t. ^ has a least upper bound). The greatest lower bound {gib) of two ele- 
ments a, h is defined as : glbfia^ h) = if a b then _L else a. The ordering ^ is 
extended point-wise to vectors on T± hy u ^ v iff \u\ = \v\ A (Vi :: u.i V v.i). 
This ordering is a CPO on the set of vectors on T±. The greatest lower bound 
is also defined point- wise over vectors of the same length: glbfiu^v) = w, where 
for every i, w.i = gib. {u.i ^ v.i). 

For each function symbol / in F, f± is a symbol of the same arity that 
indicates application to T±_ rather than to T. The interpretation ff_ of f±_ over 
T± should be a function that extends /* and is monotone w.r.t. the order □ ; 
i.e., for vectors u^v of length the arity of f±^ u □ v implies ff.u □ ff_-v. The 
ordering □ and the monotonicity condition encodes the informal description of 
_L as the “undefined” value: if v is “more defined” than iz, then ff.v should 
also be “more defined” than ff.u. The extension of a term t is represented 
by t± and is defined recursively based on the structure of the term: {x)^ = x; 
(/.(AO, . . . , A(n — 1)))^ = /±.(tx-0, . . . fi±.{n—l)). It is straightforward to show 
that the interpretation of an extended term is also monotonic w.r.t. □ . Every 
monotonic function on a CPO has a least fixpoint. 
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3.2 Constructivity as a Fixpoint Process 

A partial valuation constructed during the simplification process can now be 
represented as total function from X U F to Ty, where currently undefined 
variables are given the value _L. An initial valuation F is a function that maps 
X into T and Y to {-L}. At each step, for some non-deterministically chosen 
definition y ::= the current valuation V is updated to V.[y ^ By 

an argument [1] (cf. [5]) based on monotonicity, this non-deterministic process 
terminates with a valuation that is the simultaneous least fixpoint of the de- 
rived set of equations {y = t"^\{y ::= t) ^ E}. For a simultaneous definition 
C = (F^,X, y), let {Ifp Y : E^.{X, F)) denote this least fixpoint. The fixpoint 
depends on, and is defined for, each valuation of X. The constructivity definition 
can now be re- stated as follows. 

Definition 2 (Constructivity-FIX). A simultaneous definition (E^X^Y) is 
semantically acyclic iff for each initial valuation the vector (Ifp Y : E* .{ F, T)) 
has no Y- components. 

For a vector v over Ty, let Yfree.v be the predicate {W i :: v.i ^ Y). The 
constructivity condition is precisely (V v : Yfree.v : Yfreeflfp Y : F*.(v, F))). 
Malik [9] checks a weaker condition in which the set of internal variables F has 
a subset of “output” variables W. Let outputYfree.v be the predicate (V i : 
i e W : v.i Y). Malik’s condition can be phrased as : (V u : Yfree.v : 
output Y free. {Ifp Y : E* .{v^ Y))). 

Checking the Constructivity-FIX condition independently for each initial val- 
uation is inefficient. Malik, Berry, Touati and Shiple [9,2,12,11] use a derived 
scheme that operates on sets of external valuations. If the type T has k ele- 
ments, the scheme associates k subsets with each variable ^ in F: the set y.i^ 
i G [0,/c), contains external valuations for which the variable y evaluates to i. 
These subsets are updated by set operations derived from the semantics of the 
basic operators. For instance, for the definition “x ::= y A z”, the updates are 
given by x. false = y. false U z.false^ and x.true = y.true Pi z.true. 

This scheme has two limitations that arise for non-Boolean types: (i) the 
algorithm has to maintain k sets for each variable, and (ii) the set operations 
needed can be quite complex when the basic operators include (bounded) arith- 
metic. For example, for the definition x ::= y z, x.k would be defined as 
y.l Y z.m^ for various partitions of k as I Y m; similarly, for x ::= y t z^ x.k 
would be defined as y.l ^ z.m^ for various factorizations of /c as / * m. Our new 
formulation, Constructivity-SAT, changes Constructivity-FIX to a satisfiability 
question and avoids these difficulties. 



3.3 Constructivity as Satisfiability 

The new formulation (apparently) strengthens the Constructivity-FIX defini- 
tion to require that every fixpoint of is _L-free. The equivalence of the two 
formulations is shown in Theorem 1. 
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Definition 3 (Constructivity-SAT). A simultaneous definition {E^X^Y) is 
semantically acyclic iff{\l v^u : Yfree.v A u = E"".{v^u) : Efree.u). 

Lemma 0. For a monotone property P and a monotone function f on a CPO 

Q , R{lfp X : f.X) tff{yu:u = f.u: P.u). 

Proof. The implication from right to left is trivially true, a,s {Ifp X : f X) sat- 
isfies the condition u = f.u. For the other direction, note that the fixpoints of / 
are partially ordered by □ , with the least fixpoint below any other fixpoint. By 
the monotonicity of P, if P holds of the least fixpoint, it holds of every fixpoint. 

□ 

Theorem 1. Constructivity-FIX and Constructivity-SAT are equivalent. 

Proof. For any simultaneous definition C = (P,X, T), 

C satisfies Constructivity-FIX 
= {by definition } 

(V V : Efree.v : ±free.{lfp Y : P*.(v, T))) 

= { Yfree is monotone w.r.t. □ ; Lemma 0 } 

(V V : Yfree.v : (W u : u = E* .{v^u) : Yfree.u)) 

= { rearranging } 

{W v^u : Yfree.v A u = E*.{v^u) : Yfree.u) 

= (by definition } 

C satisfies Constructivity-SAT 

□ 



The extension of a function / from T'^ to can be defined in general as 
follows: the value of the extension at a vector v is the greatest lower bound of the 
function values at _L-free vectors above v in the order. Formally, ff.v = {gib w : 
Yfree.w A v E w \ f*.vu). It is straightforward to show that this is a monotone 
extension of /. The extensions of basic arithmetic and Boolean functions are 
easily determined by this formula. For example, the extension of A is given by: 

u A± V = false if u = false or v = false; otherwise, 

_L if u = Y or V = Y; otherwise, 

u A V 

To illustrate the use of the general formulation, we can check that 
u A± false 

= { by the general formulation } 

{gib x^y : X ^ Y A y Y A u F x A false A y : x A y) 

= { definition of ^ } 

{gib X : X Y A u F x \ x A false) 

= { definition of A } 

{gib X : X Y A u F x \ false) 

= { definition of gib } 

false 
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The extension of is similar to that for A, with 0 substituted for false. The 
extension of + is given below : 

u V = _L if iz = _L or r’ = _L; otherwise, 

U ^ V 

The extensions of other basic operators can be defined equally easily. The new 
formulation thus overcomes both the limitations of the earlier one: the extensions 
are easy to define and compute, and we do not need to maintain sets of valuations 
for each variable; the only changes required are to extend both the types of 
variables and the definitions of the basic operators. 

3.4 Indexed Variables 

In many input languages, including the S/R language of the COSPAN system, 
it is possible to declare arrays of variables. If z is such an array variable, def- 
initions of the form z[c] ::= t, where c is a constant, can be handled with 
the machinery presented earlier, by treating z[c] as an ordinary variable. A 
definition of the form z[e] ::= t, however, where e is a non-constant term, 
cannot be handled with the earlier machinery, as it corresponds to the set 
of definitions {z[c\ ::= if (e = c) then t | c G indice s {z)} . Notice that the term 
if (e = c) then t is a, partial function. As a typical instance, consider the following 
definition, where z is an array indexed by {0, 1}, and x and y are variables. 

z[x] ::= a 

z[y] ::= b 

The semantics of S/R requires that the valuations of x and y be distinct. The 
defining term for z[0] is the partial function if x = 0 then a else if y = 0 then h. 
This term may itself be considered as a partial function on Ty, defined only 
for X = 0 and y = 0. With this interpretation, it is monotonic ^ w.r.t. □ . 
Recombining the terms for z[0] and z[l], one obtains the following modification 
(for Ty) of the original definitions for z[x] and z[y]: 

z[x] ::= if X 7 ^ _L then a 
z[y] ::= if ^ _L then b 

These definitions contribute in the following way to the induced “equations” : 

(x 7 ^ _L) ^ {z[x] = a) 

e[y] = b) 

4 Implementation 

We have implemented this new formulation in the COSPAN/FormalCheck veri- 
fication system [7]. The input language for the COSPAN system is S/R (“se- 
lection/resolution”) [8]. An S/R program consists of a number of processes, 

^ A partial function / is monotonic w.r.t. a partial order A iff whenever x :< y and / 
is defined at x, / is defined at y and f.x A f.y. 
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which may be viewed as Mealy machines with Rabin/Streett-type acceptance 
conditions. The variables of each process are either state or selection variables. 
Selection variables, in turn, are either free (unconstrained) inputs or combina- 
tional variables used to determine the next-state relation of the system [8]. In 
the terminology of the earlier sections, the state variables together with the free 
input variables form the “external” variables, since the inputs and state variables 
do not change value for the duration of the selection cycle; the other selection 
variables form the “internal variables” . 

There are no restrictions in S/R on the dependencies between selection 
variables: selection variables declared within a process may be mutually inter- 
dependent and may be used as inputs to other processes, thus potentially intro- 
ducing syntactic cycles that span process boundaries. In addition, the presence 
of semantic cycles may depend on the valuation of the state variables. For in- 
stance, a semantic cycle may be “unreachable” , if the particular states in which 
it is induced are unreachable. The question, then, is to identify whether any se- 
mantic cycles are present in reachable states of the program for some free-input 
valuation (this problem is shown to be PSPACE-complete in [11]). 

The S/R compiler parses the program and analyzes syntactic dependencies 
among internal variables. If there is a syntactic cycle, it identifies a set of in- 
ternal variables whose elimination would break each syntactic cycle; such a set 
is commonly called a “feedback vertex set” . The parser retains the variables in 
the feedback vertex set, and macro- expands the other variables, so that the vari- 
ables in the feedback vertex set are defined in terms of themselves and the input 
and state variables. In the terminology used earlier, these remaining internal 
variables and their defining terms form the simultaneous definition that is to be 
analyzed. We will refer to these internal variables as the “relevant” variables. 
Each relevant variable is treated as a state variable for reachability analysis. 

Our implementation uses a single MTBDD terminal to represent _L. While 
MTBDD’s for multiplication are exponential in the number of bits, they repre- 
sent most other operations efficiently and are therefore used in COSPAN. The 
types of the relevant variables are extended to include the _L-terminal. The types 
of input and state variables are not extended. The implementation includes a li- 
brary of extended basic operators, defined as described in Section 2. These extend 
the basic operators of S/R, including Boolean operators, arithmetic operators 
such as +, *, div^ exp, mod, and conditional operators such as if then else . 

Each definition x ::= t of a relevant non-indexed variable is converted to the 
equation x = while a definition z[e] ::= t of an indexed variable is converted 
to (e^ 7 ^ _L) ^ (^[^11 = ^1): described in Section 3.4. The conjunction of 

these formulae forms the simultaneous fixpoint term Y = E*.(5, X, T), where S 
is the set of state variables, X is the set of free input variables, and Y is the set 
of relevant variables. The Constructivity-SAT formula determines (by negation) 
the following predicate on state variables: 

CycUc.S = {3X, Y : T = E*.(5,A,T) A -_L/ree.T). 

The predicate ^Cyclic is checked for invariance during reachability analysis; 
if it fails, the system automatically generates an error-track leading from an 
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initial state to a state s such that Cyclic. s is true. It is not difficult to recover 
the set of variables involved in a semantic cycle for a particular input k at state 
s by inspecting the BDD for (Y = E* .{s^k^Y) A -i_L/ree.y) - every path to 
a 1-node includes variables that have value _L; these variables are involved in a 
semantic cycle. 

The description above should indicate that implementing the constructivity 
check with the new formulation is a fairly simple process. We have experimented 
with this implementation on a test suite for COSPAN formed of several large 
programs that represent real designs. While syntactic cycles are usually short, 
some of our examples had cycles of length greater than 20. Our experience has 
been that, in most cases, the run-time and BDD sizes increase, if at all, by a 
negligible amount. There are a few cases where the BDD sizes increase by a large 
amount, and even some where the sizes decrease - this seems to be attributable 
to the irregular behavior of the dynamic reordering algorithms. We have not 
conducted a thorough comparison with the algorithm in [12], but it is reasonable 
to expect that our algorithm will be more efficient for non- Boolean variables, as 
it avoids both the large number of BDD’s and the fixpoint computation. It is 
less certain whether our algorithm offers a large improvement on the earlier 
one in the case when all variables are Boolean; this would require experimental 
comparison. In any case, the potential benefits of detecting semantic cycles before 
circuit fabrication far outweigh the disadvantage of the (usually small) time and 
memory increases that we have observed for our detection process. 

5 Related Work and Conclusions 

The work most related to ours is by Berry [2] and Shiple [11]. Berry proposed the 
original operational formulation of constructivity (Constructivity) and the de- 
notational formulation (Constructivity-FIX), based on work by Malik [9]. These 
definitions are based on computational processes - one would prefer a non- 
computational definition of the concept of “semantic acyclicity” . Shiple, Berry 
and Touati [12,2,11] and Malik [9] propose symbolic, fixpoint-based algorithms to 
check constructivity. These algorithms are difficult to implement and somewhat 
inefficient for variables with non-Boolean types. 

Our new formulation overcomes both limitations, by presenting a simple, non- 
computational definition of constructivity (Constructivity- SAT) and a symbolic 
algorithm based on the new formulation that is simple to implement for variables 
with arbitrary finite types. Our initial experiments with the implementation of 
this algorithm in the formal verification system COSPAN/FormalCheck indicate 
that in most cases it has minimal, if any, adverse impact on the execution time 
and BDD sizes. It should be quite easy to incorporate this algorithm into other 
verification and synthesis tools. As in [11], one can also determine the set of 
input values for which a circuit is constructive, by not quantifying over v in the 
Constructivity-SAT definition. 

In [11] Shiple considers a class of cyclic combinational circuits whose behavior 
is based on the assumption that the circuit retains “state” across clock cycles 
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(for example, a flip-flop implemented by a pair of cross-connected NAND gates). 
It would be interesting to see if our formulation of constructivity can be modified 
to analyze such sequential behavior. 

Acknowledgements: Thanks to Tom Szymanski for providing references to 
work on constructivity, and to Kousha Etessami, Mihalis Yannakakis, and Jon 
Riecke for useful comments and discussions about this work. 
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6 Appendix 

This appendix contains the definitions of the extended basic operators of S/R. 

Boolean Operators: 



u A± V = 


false 


if u = false or v 


= false] 


otherwise. 




_L 

u A V 


if u = J- or V = 


X 


otherwise. 


U Vy V = 


true 


if u = true or v 


= true] 


otherwise. 




_L 

u V V 


if u = ^- or V = 


X 


otherwise. 


U = 


_L 


if u = ±; 




otherwise. 



Arithmetic Operators 

U V = _L 

U ^ V 

0 

_L 

U V 

u div± V = 0 

_L 

u div V 

u mod± V = 0 

_L 

u mod V 

u exp^ V = 0 

1 

_L 

u exp V 

Comparison Operators: 



U <± V = 




_L 


if u = ± 


or V = 


X 


otherwise. 






u < V 










U <± V = 




_L 


if u = ± 


or V = 


X 


otherwise. 






u < V 










U =±_ V = 




_L 


if u = ± 


or V = 


X 


otherwise. 






u = V 










Conditional Operators: 








(\f c then u 


else 


v)^ = 


u 


if c = 


true; 


otherwise. 








V 


if c = 


false; 


otherwise. 








u 


if c = 


_L and u = v; 


otherwise. 








_L 


if c = 


_L and u ^ v 




(\f c then u 


L = 




u 


if c = 


true 





if u = J- or V = 

if iz = 0 or = 0; 
if u = ± oi V = 

if u = 0: 



if iz = 0 or = 1; 



if iz = 0; 

if = 1 or = 0; 
if u = J- or V = 



otherwise, 

otherwise, 

otherwise, 

otherwise. 



if {u = ± and v ^ 0) or v = ±; otherwise. 



otherwise. 



if {u = ± and v ^ 0) or v = ±; otherwise. 



otherwise, 

otherwise, 

otherwise. 
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Abstract. BDDs and their algorithms implement a decision procedure 
for Quantified Propositional Logic. BDDs are a kind of acyclic automata. 
Unrestricted automata (recognizing unbounded strings of bit vectors) can 
be used to decide more expressive monadic second-order logics. Prime 
examples are WSIS, a number-theoretic logic, or a string-based notation 
such as those proposed in some introductory texts. It is not clear which 
one is to be preferred. Also, the inclusion of first-order variables in either 
version is problematic since their automata-theoretic semantics depends 
on restrictions. 

In this paper, we provide a mathematical framework to address these 
problems. We introduce three and six-valued characterizations of regular 
languages under restrictions. From properties of the resulting congruen- 
ces, we are able to carry out detailed state space analyses that allows 
us to solve the two problems in WSIS in a way that require no extra 
normalization calculations compared to a naive decision procedure for 
string-oriented logic. 

We report briefly on the practical experiments that support our results. 
We conclude that WSIS with first-order variables is the superior choice 
among monadic second-order logics. 



1 Motivation 

Biichi[2] and Elgot[4], and independently Trakhtenbrot[13], argued almost fourty 
years ago that a logical notation, now called the Weak Second-order theory of 
1 Successor or WSIS, would be a more natural alternative to what already 
was known as regular expressions. WSIS has an extremely simple syntax and 
semantics: it is variation of predicate logic with first-order variables that denote 
natural numbers and second-order variables that denote finite sets of natural 
numbers; it has a single function symbol, which denotes the successor function, 
and has usual comparison operators such as <,=,G and D. Buchi, Elgot, and 
Trakhtenbrot showed that a decision procedure exists for this logic. The idea 
is to view interpretations as finite strings over bit vectors and then to show by 
explicit constructions of automata that the set of satisfying interpretations for 
any subformula is a regular language. A distinguishing feature of this number- 
theoretic approach is that the semantics refer to oM the natural numbers or all 
of finite subsets. 

In contrast, the logical semantics often suggested in explanations of the logic- 
automaton connection, such as in [11,12], is tied to the finiteness of the strings 
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of a regular language. Here, the notation is interpreted over a string, which is 
fixed for the purpose of the semantics. The string defines a set of positions from 
0 to the length of the string minus 1; then, first-order variables range over this 
set, and second-order variables over its subsets. This string-theoretic approach is 
appealing for certain applications, for example in the description and verification 
of parameterized hardwarefl]. Among other names, these logics have been called 
MSO(S)[12], SOM[+][ll], and M2L(Str)[5,7]. They vary slightly, but we will 
identify them as M2L(Str) in this paper. 

There are at least three important reasons for preferring the number-theoretic 
approach. (1) Its mathematical semantics is simpler. (2) WSIS appears to be 
the stronger logic: it is easy to encode Presburger arithmetic in WSIS, but 
no similar encoding is known for the string-theoretic formulation. Presburger 
arithmetic by itself is a promising verification technique, see [3,10]. (3) There 
are semantic problems in the string-theoretic formulation as pointed out in [7]; 
for example, what does a first-order variable denote if the string is empty and 
thus define no positions? 

Even so, it is not obvious that any string-theoretic problem solved by a deci- 
sion procedure for M2L(Str) can be effectively encoded in WSIS. More precisely, 
we desire an e dent translation algorithm^ which we define to be one that in 
linear time transforms any formula (j) in M2L(Str) to a formula in WSIS (j)' 
such that (f)^ is decided in time linear in the time that <j> is decided. Let us call 
the question of finding such an algorithm the translation problem. In practice, 
of course, we want something stronger: the total running time of going around 
WSIS should be no longer than using the M2L(Str) decision procedure directly. 

Another problem with monadic second-order logics is that first-order va- 
riables and terms are handled by formula rewritings transforming them into 
second-order variables subjected to logical restrictions. Consequently, automata 
corresponding to subformulas are not simply determined by the mathematical 
semantics, but also by details of the rewritings. Alternatively, extra automata 
product operations can be used to normalize these intermediate automata with 
automata corresponding to the restrictions. The first-order semantics problem 
is to find a representation that is no bigger than a normalized representation, 
while not requiring extra normalization steps. 

Contributions of This Paper 

In this paper, we propose solutions to the translation problem and the first-order 
semantics problem. Our solutions are based on a theory of restrictions that we 
develop as follows. 

We formulate a syntax for WSIS, where restrictions are made explicit, and 
we provide initially three different semantics: (1) the ad hoc semantics that 
correspond to the usual treatment of first-order variables, (2) the conjunctive 
semantics^ where all the intermediate automata are conjoined with restrictions, 
and (3) the three-valued semantics. We explain why the ad hoc semantics must 
be rejected, and why the conjunctive semantics would slow down the decision 
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procedure. We show that the three- valued semantics makes most normalizations 
unnecessary. Also, we indicate how the three-valued semantics can be realized 
using an automata-theoretic approach adapted from the standard WSIS decision 
procedure. 

To study the question of automata sizes, we give a detailed congruence- 
theoretic analysis of a regular language under restrictions. We introduce a notion 
of a thin language, and we show that the restrictions occurring in the treatment 
of first-order variables and in the translation problem are thin. We prove that 
languages under thin restrictions make comparisons of the conjunctive semantics 
and the the three-valued semantics easy: the latter are the same as the former 
except for some extra equivalence classes that we characterize. We show that 
if the automata of restrictions are bounded, then the sizes of intermediate au- 
tomata occurring under the three-valued semantics are, to within a constant 
factor, the same as the sizes of automata of the conjunctive semantics. 

We strengthen this result by exhibiting congruences based on a six-valued 
semantics that are no bigger (to an additive constant of 3) than those of the 
conjunctive semantics. Our main result is that the resulting decision procedure, 
while requiring only few normalizations, involve intermediate automata that 
are up to exponentially smaller than the ones occurring under the conjunctive 
semantics. 

Finally, we report on our integration of the theory presented here into the 
tool Mona [9], which implements a decision procedure for WSIS. We conclude 
that WSIS, and not a string-oriented logic, is the superior logical interface to 
automata calculations. 

2 WSIS: Review and Issues 

Nutshell WSIS can be presented as follows. A formula (j) is composite and of 
the form (f>^ & 0^^, or ex2 : (j)\ or atomic and of the form F^ sub FN 
F^ <= F^ j F^ =F^ \ F^% or F^ =F^ +1. Here, we have assumed that variables 
are all second-order and named F^ ^ where i > 1. Other comparison operators, 
second-order terms with set-theoretic operators, and Boolean connectives can 
be introduced by trivial syntactic abbreviations, see [9,12]. The treatment of 
first-order terms is discussed later. 

Semantics of WSIS Given a fixed main formula which we sometimes 
regard as an abstract syntax tree (with its root facing up), we define its semantics 
inductively relative to a string w over the alphabet B^, where B = {0,1} and 
k is the number of variables in 0o- We assume that is closed and that each 
variable is bound in at most one occurrence of an existential quantifier. Generally, 
we assume that all formulas are sub formulas of ^o- We now regard a string 
w = ao • — a^-i, where ^ = |tc| is the length of tc, to be of the form: 
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where we have indicated that if the string is viewed as a matrix, then row i 
is called the -track. Each letter a is sometimes written in a transposed not- 
ation as (a^, ... ,a^)^ The interpretation of F^ defined by w is the finite set 
{j I the jth bit in the -track is 1}. Note that suffixing w with any string con- 
sisting of letters of the form (0, . . . ,0)^ does not change the interpretation of 
any variable. Therefore, we will say that w is rninirnura if it possesses no such 
non-empty suffix. 

The semantics of a formula 0 can now be defined inductively relative to an 
interpretation w. We use the notation (f> (which is read: w satisfies (j)) if the 
interpretation defined by w makes (j) true: 

vj 1= iff tc 

w ^ (j)^ k (j)^^ iff u? 1= (j)^ and w \= 

w 1= ex2 F^ : (j)^ iff 3 finite M C N : ^ M]^ (j)^ 

tc 1= sub F^ iff w[F'^) C w{F^) 

w \= F'^' <= F^ iff V/i G tc(P^) : 3k G w{F^) : h < k 

w \= F'^' = F^\F^ iff tc(P^) = w{F^)\w{F^) 

F^ = F^ +1 iff u;(P^) = {j T 1 I j G w{F^)} 

where we use the notation w[F'^ M] for the shortest string v/ that interprets 
all variables P-^ , j i, as tc does, but interprets P^ as M. Note that if we here 
assume that w is minimum, then w is of the form w -wq^ where all tracks, except 
the P^ -track, in wq are all Os and either w is empty or at least one non-P^ track 
in w is of the form B* • 1. Then, v/ is of the form w • where tc^Ms 0 everywhere 
except for the P^ -track, which is of the form B* • 1 if non-empty. 

Note that the interpretation of is independent of tc, since it is a closed 
formula. Thus, 0o is either true or false, and we write either \= (j>o or 0o- 1^'or 
any formula 0, we associate the language = {w \ w \= (j)}. 

2.1 Automata-Theoretic Semantics 

The automata-theoretic semantics defines a decision procedure that associates 
to each (j) the minimum automaton accepting the language L^. For atomic 
formula, a small automaton (with at most three states) can be directly con- 
structed. For a formula 0 of the form the automaton A^p is taken to be the 
complement of the automaton which is calculated by induction. Note that 
this automata-theoretic semantics of negation is symmetric: the complement 
automaton is gotten by just reversing final and non-final states. The case of con- 
junction is handled by an automata-theoretic product construction, followed by 
a minimization construction. Finally, the case of quantification is slightly more 
complicated. Consider (j) = w \= ex2 PC (j)k We calculate A^p from by means 
of an intermediate, nondeterministic automaton A^// that is gotten from A^/ in 
two steps. First, any state for which a path exists to an accepting state along 
a string of letters of the form (0, . . . ,0, A,0, . . . ,0)^ (where the X means that 
the value of the ith component is irrelevant) is made accepting. Second, for any 
transition of the form (s, a, s^) from state s to we add the transition (s, a, s^^). 
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where is the state reached according to the unique transition (s, a, with a 
being the same letter as a except that the ith component is negated. The auto- 
maton A(p IS then calculated by determinizing , followed by a minimization 
construction. 

2.2 Semantics of First-Order Variables 

Adding first-order variables to WSIS can easily be done as follows: a first-order 
variable p is regarded as a second-order term F that is restricted to take on 
values that are singleton sets, where the sole element denotes the value of p, 
see [12,11,8]. The restriction can be imposed by conjoining a singleton predicate 
singleton (/^) to the formula where F is quantified. This ad hoc strategy me- 
ans that the semantics of a formula containing p is not robust: its meaning on 
interpretations w not fulfilling singleton (P) is not well-defined. Even if the 
restriction is imposed whenever p occurs in an atomic formula, the semantics is 
not closed under complementation. For example, the formula (j) = p=0, where 
p is first-order is handled as (j)' = P={0}, where F is second-order. But the 
complement of is '^(F = {O}), something that is different from the represen- 
tation of '"(p = 0), namely '^(F = {O}) & singleton ( P) . The solution is to 
conjoin the restriction to every subformula ^ in a procedure we call normaliza- 
tion. Then, we would have a simple explanation of the language T(0) that we 
call the conjunctive semantics. 

The practical problem with the conjunctive semantics is that additional pro- 
duct and minimization calculations would be necessary: for each automaton A 
representing a sub formula (p and each free variable P% the automaton repre- 
senting the singleton property for P^ must be conjoined to A. Such extra cal- 
culational work slow down the decision procedure, probably by a factor of at 
least two. (Complementation, which is normally fast since it consists of flipping 
acceptance statuses of states, now would involve a product and a minimization 
operation; and product operations would involve at least one additional product 
and minimization even if the restrictions are calculated separately.) So in prac- 
tice, the Mona implementation (prior to the one implemented with the results 
of this article) used the ad hoc strategy: the restriction for variable p is conjoi- 
ned only to atomic formulas where p occur and to the formula in the existential 
quantification introducing p. 

Ad hoc emulation of string semantics in WSIS A simplified syntax for the string- 
theoretic version of monadic second-order logic is the same as nutshell WSIS 
syntax. The satisfaction relation is denoted string] if is fh^ same as for WSIS 
except that quantification is changed to: 

u; ^stT^ng ex2 P^ : 0Mff 3M C {0, . . . , |u;| - 1} : w[F'^ ^ M]^ <p^ 

where the notation w[F'^ ^ M] now has a different meaning: it denotes the string 
w altered so that the P^ track describes M . Thus, the witness string tc[P^ i-^ M] 
for the existential quantification has the same length as w. The interpretation 
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of (j)o on a string of w still does not depend on the individual tracks of tc, but 
it does depend on the length of w. Thus we write i string 0o if 0o holds for a 
string w of length i. For example, a closed formula can be written that under 
this semantics holds if and only if w is of even length. 

To emulate string in l=, we must restrict all second-order terms to sets of 
numbers less than or equal to the last position in the string. Thus, we introduce 
a first-order variable $ that simulates the entity |tc| — 1. A ^-constraint for a 
variable expresses that the variable is a subset of {0, . . . , $}. Then, we normalize 
all formulas by conjoining $-constraints for all free variables. The result is a 
WSIS formula with one free variable $ such that i string 0 ^ w \= (pp where 
the $-track of w interprets $ as i. For example, the formula exl p: exl c\: p = q 
becomes in WSIS 

ex2 F : ex2 Q : 

singleton(P) & singleton(Q) & singleton($) 

& F<=% & Q<=% & Fsuh Q Sc Q sub F 

as expressed in nutshell syntax, whereas the M2L(Str) formulation is 

ex2 F : ex2 Q : 

singleton(/^) & singleton(Q) 

Fsuh Q Sc Q sub F 



Proposition 1. Under the translation outlined above^ the minimized^ canonical 
automata arising during the WSIS decision procedure are essentially the same as 
the ones arising during the M2L(Strj procedure except for one or two additional 
states. 

Proof. The WSIS automaton can be gotten from the M2L(Str) automaton by 
considering the $-track as some F^ track and by adding states ^accept 
cepting state) and Sr-eject (a rejecting state). The transition relation of the new 
automaton is the same as for the old one as long as the $-component is 0. When 
the $-component is 1, corresponding to the end of the string under the M2L(Str) 
representation, a transition is made to ^accept or Sr-eject according to the accept 
status of the state that would have been reached in the old automaton. From 
-^accept^ ^ transition is made to s^eject if $-component is 1 or if any other 
component corresponding to a first-order variable is 1; otherwise, the transition 
is made to ^accept- The Sr-eject state is connected to itself on all letters. The WSIS 
automaton so described may not be minimum, since the reject state may already 
have been present in the automaton. All other states of the old automaton are 
still distinct when considered as part of the new automaton. 

Our practical experiments with running string-based examples translated into 
WSIS were based on this ad hoc strategy. We discovered the following problem. 

Parity exam^ple Consider the formula exl p: (p in P^ <=> • • • <=> p in P'^) under 
the string-theoretic semantics. The formula holds if and only if there is a position 
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contained in an even number of the sets P\ Translated into nutshell WSIS under 
the ad hoc strategy, the formula becomes: 

ex2 P : (PinP^ & singleton(P) & singleton($) & P^<=$) <=> 

•••<=> ( 1 ) 
[PlnP^ & singleton(/^) & singleton($) & P^<=$). 



Proposition 2. The parity formula (1) produces intermediate automata whose 
size is doubly exponential inn. But if the restrictions are conjoined to all sub for- 
mula^ not only the atomic ones^ then all intermediate automata have less than 
6 states. 

We formalize the ad hoc semantics in the next section; but already here, it is 
clear that it is inadequate for restrictions. 



3 WSlS with Restrictions and a Three- Valued Semantics 



To give a precise understanding of restrictions, we introduce nutshell WSIS-R^ 
a variation on WSIS where restrictions are made explicit. Existential quantifi- 
cation becomes ex2 P^ where p: fh Let p(P^) = p be the restriction of variable 
PL Also, we assume that each P^ is restricted, possibly to the formula P^=P^, 
i.e., true. The semantics we will propose for this syntax rely on an exact under- 
standing of the binding mechanisms in play. We say that in p(P^), variable P^ 
is p-bound. Variable P^ is existentially bound in both p(P^) and fh A variable 
occurrence P^ is free in the conventional part of f if P^ is free in 0 in the usual 
sense, where f is regarded as an independent formula, and the occurrence is 
not within a restriction of an existential quantification within f. The relevant 
variables^ RV(^), for formula f is the least set of variables P such that there 
is an occurrence of P that is not p-bound and that is free in the conventional 
part of (j) or free in the conventional part of p{P^)j where P^ G RV(^). We define 
the induced restriction p*(0) to be the conjunction of the restrictions of relevant 
variables, that is, ApiGRV(c/>)' 

To carry out inductive arguments, we define the partial ordering < among 
sub formulas (regarded as nodes in the abstract syntax tree) as follows: (j> ^ (j>^ 
if 0 is a subformula of or if there is a formula = ex2 P^ where p(P^) : 
such that 0 is a sub formula of p(P^) and is a sub formula of The partial 
ordering < is well-founded (a post-order labeling of nodes with numbers 0, 1, . . . 
produces an ordering containing <). Note for each P G RV(0), p{P) < This 
will ensure that the semantic definitions to follow make sense. 



The ad hoc semantics We state the ad hoc semantics using a meaning function 
10]]®^ (anticipating multi-valued semantics): 
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l(j)^ & A 



[ex2 where p\ = 



if 3M : l(j)'Y^w[P^ ^ M] 
if VM : l(j)'Y^w[P^ ^ M] 



1 and = 1 

0 or |p*(P0r^' =0 



\P^ sub = 



if a; N P* sub P^ and [p*(P^ sub P^)Y'^w = 1 
if P* sub P^ or [p*(P^ sub P^)}^^w = 0 



We have only shown the semantics of one kind of atomic formula; the others 
are treated similarly. (The normalization of atomic formulas is optional.) 



The conjunction semantics This semantics is the same as the ad hoc seman- 
tics except that the restrictions are also applied to the case of & and 

The three-valued semantics Let = B U{T} be the extended Boolean domain. 
We use _L to denote a “don’t care” situation, one where not all the restrictions 
hold. Boolean operators and -i^ are defined on this domain as for the usual 
case with the added rule that if any argument is T, then the result is T. 

A^ U'Tw 

{ 1 if 3M : UTMP" ^ M] = 1 
0 if VM : Wfw[P^ ^ M] / 1 and 3M : i(p'fw[P^ ^ M] = 0 

T if VM : ^ M] = 1 

{ 1 if tc N P^ sub P^ and [p*(P^ sub P^)^^w = 1 

0 if u; A P^ sub and lp*{P^ sub P-^ )F = 1 

T if Ip*{P^ sub Pt)f ^ 1 



Something seems to be missing in this semantics: the enforcement of a re- 
striction of a variable in an existential quantification. The proposition below 
shows that the restriction bubbles up automatically if needed. The semantics 
works only if we require that every restriction is satisfiable given that the re- 
strictions referred to by the restriction are already true. Formally for sub formula 
(j) = ex2 P^ where p: of we require 






( & 

PeRV{P)\{P^} 



p(P)) => ex2 P^ : p 



( 2 ) 



The semantics is now justified as: 

Proposition 3. Given the requirement (2)^ the following holds. 

(a) w \f p(P^ some P^ in RV(0) w ^ = -L* 

(h) w \= (j) k p"^{(j)) ^ [^Fw; = 1 
(c) w \= '^(p Sc p*(0) I^F^ = ^ 
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Automat a- theoretic realization of the three- valued semantics The pro- 
cedure outlined in Section 2.1 can be modified to reflect the three- valued seman- 
tics. The case of existential quantification requires a slightly more sophisticated 
reclassification of the acceptance statuses of states prior to the subset construc- 
tion. Let us call the resulting algorithm the three-valued decision procedure, 

4 Congruences for Restricted Languages 

All languages considered will be regular and over the alphabet U = B^. For 
a language L, the canonical right- congruence is defined as u v iff Vtc : 
U'W G L <=> V'W G L, where u^v^w G A*. The set of congruence classes is denoted 
This set can be regarded as the canonical, finite-state automaton. 
Consider languages L, sometimes called the property^ and it, assumed non- 
empty, called a restriction. Thus, and constitute such a pair for 

any sub formula (j) of ^o- The conjunction representation is = L Pi it, and 
the conjunction congruence is J-Te three-valued representation is not a 

language, but a function defined to be 1 if w G L Pi it, 0 if w G 

L P it, and T if w ^ it. The three-valued congruence is then defined 

hj u V for all tc, Xl r{'^ ' ^) = Al r{'^ * ^)* 

4.1 Relating the Conjunction and Three- Valued Semantics 

A thin language it is a non-empty set of strings such that 

Vw, v^w : u v^u-w^RVv-w^R (3) 

In particular, the canonical automaton for it has exactly one accepting state. 

Proposition 4. L Rsingieton(i) = {^ G B^ I track i contains exactly one occur- 
rence of a 1} is thin. 

2. The language 

^$-restrict(i) = {^ G B^ \thc occurrcnces of 1 in track i are all in positions 
no greater than that of the first occurrence of a 1 
in track P Rsingleton($) 

is thin. 

3. If R and R^ are thin and it P it^ 7 ^ 0^ then RO R^ is thin. 

f. Let R he thin^ and let L he any language. Ifu ^lhr ^ ci'^d u has an accepting 
extension^ then u ^r v (and^ consequently^ u r'^)^ 

5. If u and v both have no accepting extensions^ then u ^r v ^ u ^ v. 

6. Thus^ if R is thin^ then ^ | < |A*/^Lni^ | + |. 

From this proposition, it follows easily that all p*{(j>) are thin languages if va- 
riables are subjected to first-order restrictions or $-restrictions (or both). The 
proposition also tells us that ^ is pieced together from S*/^LrR plus a 

subset of TTjr^R. 
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Proposition 5. Assume that all restrictions are thin languages. If the automata 
of restrictions are hounded in size^ then the sizes of the intermediate^ minimi- 
zed automata in the three-valued decision procedure are the sarne^ to within an 
additive constant^ as the sizes of corresponding automata under the conjunctive 
semantics. 

This result is the justification for the practical use of the three- valued semantics 
since usually the number of first-order variables in simultaneous use is quite 
small. (The size of the additive constant is exponential in the number of free 
first-order variables.) And as with the ad hoc semantics, normalizations are not 
required for most subformulas, and the automata are, apart from the 
parts, the same as those that occur when the automaton of every subformula is 
normalized. 



4.2 The Six- Valued Representation 

We show next how to get rid of the boundedness assumption in Proposition 5. 
Define a string n to be interesting if it has (a) some extension i;, called an 
accepting extension^ such that u • v in LO and (b) some extension T, called a 
rejecting extension^ such that u-v in LC\R. Also, a Ilonl care^^ extension is one 
that makes a string fall outside R. Note that all prefixes of an interesting string 
are also interesting. In other words, an uninteresting string cannot be extended 
so as to become interesting. The truth- value i{u) denotes whether a string is 
interesting. Let cut{u) be the shortest uninteresting prefix of u if such a prefix 
exists; otherwise, when all prefixes are interesting, cut{u) is defined to be u. The 
membership status e[u) of uninteresting u is defined by 

1 if cut{u) has an accepting extension 
0 if cut{u)u has a rejecting extension (4) 

_L if all extensions of cut{u) are “don’t care” 

(These three cases are clearly mutually exclusive.) When u is interesting, t{u) is 
defined to be i?(^)- Define the sexpartite representation x% r to be e(n)). 
The canonical six-valued congruence is defined from the representation as 

before. Now, an equivalence class M is either interesting or non- interesting. In 
the latter case, there is a value E G By such that for all w G M, t[u) = E; 
moreover, for all i;, u-v is also in M . Thus, the non-interesting equivalence classes 
are graph-theoretic sinks when ^ is regarded as a finite-state automaton. 

There are between 0 and 3 such classes, depending on L and R. 

Let c be a natural number and let^ :N*^Bbea Boolean characterization 
of all strings. We say that ^ quasi-refines ze up to c under f when there are 
strings Ui, . . . , such that 

Vw, E : f (u) Au E ^ f [E) Au ze E and 
Vw, wN -i^(n) A n ^ ^ ^f[E)A3iE‘''^^UiAEzeuj 



e{u) = < 



( 5 ) 
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ThuSj ^ respects ^ (that is, it can mapped through i7*/ and ^ is as least 
as fine as on strings for which ^ holds; but, when ^ doesnd hold, strings are 
mapped to one of the c designated equivalence classes of 

Proposition 6. If R is thirty thenr^Lr\H quasi-refines up to 3 under f{u) = 
Ihere is an accepting extension of 

Thus, the six- valued congruence squeezes the parts of ^ that corresponds 

to (as explained after Proposition 4) into at most three classes. 

4.3 Six- Valued Semantics for WSIS and Sexpartite Automata 

Under the six-valued semantics, the automaton corresponding to f calculates 
L ^ six- way partition of the states. For non-interesting strings, it 

may erroneously calculate a value in {0, 1}, where the three- valued semantics 
specifies _L. Consequently, a product with the automaton for the restriction of 
a variable must be carried out before the qualifier elimination in the WSIS-R 
decision procedure. However, it can be shown that no minimization is necessary 
following this step. Let us call the resulting algorithm the six-valued decision 
procedure. Thus, we may improve Proposition 5: 

Theorem 1. Assume that all restrictions are thin languages. 

1 . The sizes of the intermediate^ minimized automata occurring during the six- 
valued decision procedure are (to within an additive constant) less than those 
of the conjunctive semantics. 

2. The conjunctive automata may he exponentially bigger than the six-valued 
automata. 

3. The six-valued decision procedure require no normalization for products and 
complementations. 

5 In Practice 

We showed experimental evidence in [6] that we had found WSIS to be as fast a 
way to decide string-theoretic problems as M2L(Str) but only after sometimes 
solving by hand state explosion problems like the one discussed in Section 2.2. 

Since June 1998, the Mona tool has been based on the three- valued semantics 
for WSIS, and our state explosion problems stemming from running M2L(Str) 
formulas through WSIS have disappeared. Moreover, with a default restriction 
mechanism that we have added to Mona, M2L(Str) formulas can be directly 
embedded in WSIS. The running times under this semantics are in all non- 
contrived cases the same (to within 5% or so) as for the ad hoc semantics we used 
before. (In practice, we used first-order restrictions that are not thin languages, 
but which enjoy similar properties.) We have not yet implemented the six- valued 
semantics, but there is no reason not to expect that it will run as fast, while 
sometimes making intermediate automata smaller. 

Thus, we believe to have established WSIS as the superior choice for a prac- 
tical logical notation associated with automata. 
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Abstract. State- space explosion remains to be a significant challenge for Finite 
State Machine (FSM) exploration techniques in model checking and sequential 
verification. In this work, we study the use of sequential ATPG (Automatic Test- 
Pattern Generation) as a solution to overcome the problem for a useful class of 
temporal logic properties. We also develop techniques to exploit the existence 
of synchronizing sequences to reduce some temporal logic properties to simpler 
properties that can be efficiently checked using an ATPG algorithm . We show 
that the method has the potential to scale up to large, industrial-strength, hardware 
designs for which current model checking techniques fail. 



1 Introduction 

The state-space explosion problem that challenges Finite State Machine (FSM) explo- 
ration techniques such as CTL temporal logic model checking [McM93] for automa- 
tic formal verification has been intensively studied from various angles. There have 
been numerous efforts to tackle the state-space explosion problem [CGL94]. Techni- 
ques such as compact data structures to represent the state-space [Bry95], on-the-fly 
model checking [Pel96], state-space reduction techniques such as localization reduc- 
tion [Kur94], and navigated model checking [TSNH98] have improved the applicability 
of model checking towards increasingly large designs. 

However, past efforts in alleviating the state- space explosion problem fall short of 
making model checking scale up for efficient automatic verification of current, indu- 
strial, hardware designs. Current model checking techniques could fail in several ways 
including failure to extract state-transition relation information from the design structure 
and requiring excessive storage for functional representations of the state-space during 
computation. 

In this work, we study the use of sequential ATPG (Automatic Test-Pattern Gene- 
ration) algorithms [ABF90] for model checking a simple class of CTL formulae. The 
approach involves the construction, based on the CTL formula, of a new circuit structure 
from the circuit to be verified. Model checking is then cast into detecting a stuck-at- 
fault on the output line of the constructed circuit. The method avoids building elaborate 
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functional information such as a complete state transition relation and benefits from 
a directed, on-the-fly, structural exploration of the design under verification. Experi- 
ments are performed on benchmark circuits with simple formulae of the form AG EF 
P to study the efficiency of state space exploration with a state-of-the-art sequential 
ATPG. Furthermore, we show reductions of the form AG EF P to EF P based on 
verifying the existence of synchronizing sequences [Koh78,Hen68], whose application 
causes an FSM to reach a specific state regardless of the starting state. The motivation 
for these reductions stems from the fact that most machines are synchronizable (atleast 
partially/weakly) with designer-supplied/ATPG-generated sequences and because they 
enable efficient ATPG problem formulations. The reduction results have been verified 
and incorporated into the PVS [ORR+96] proof checker. 

The rest of the paper is organized as follows: Section 2 discusses the reduction of 
simple CTF formulae for synchronizable machines. A discussion on sequential ATPG 
algorithms and their application in state-space exploration is given in Section 3. The 
method of transforming model checking simple CTF formulae to stuck-at-fault testing 
is explained in Section 4. Experimental results are discussed in Section 5. Finally, con- 
clusions are summarized in Section 6. 

2 Formula Reduction for Synchronizable FSMs 

Synchronizability of a FSM is used to reduce CTF formulae of the form AG EF P to EF 
P. A formal definition of synchronizable machines is given in Section 2.1 followed by 
an example illustrating the reduction in Section 2.2. The formalization of the reduction 
of CTF formulae and their proofs in the PVS proof checker are explained in Section 2.3 
and Section 2.4 respectively. 

2.1 Synchronizable FSMs 

Definition 1 (Synchronizability [Koh78,Hen68]) A machine M is synchronizable, if 
there exists an input sequence Y, that takes M to a specified final state, regardless of 
the output or the initial state. 

Definition 2 (Initializability [CA89,CA93]) A machine M is initializable with three- 
valued logic simulation if there exists an input sequence Y, such that the resulting state 
of M (evaluated by three-valued simulation) is fully specified on the application ofY, 
when the initial state is fully unspecified (consisting of all Xs and corresponding to the 
entire state space). Initializability, is thus synchronizability subject to three-valued logic 
simulation. 

It is important to note that verifying that a given sequence of test vectors is an initia- 
lizing sequence is a much simpler task than verifying if the sequence is a synchronizing 
sequence. The reason is that while checking for initializing sequences can be done on 
the structure of the circuit (using 3 -valued logic simulation on the netlist), checking for 
synchronizing sequences may often require some form of knowledge and representation 
of the state space. For large, industrial designs, the only feasible checks possible are 
based on initializing sequences. 
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Table 1. Example FSM 



PS i 


NS 

x=0 


NS 

X=1 


A 


B 


D 


B 


A 


B 


c 


D 


A 


D 


D 


c 



2.2 Basic Idea and Example 

Consider the FSM shown in Table 1 [Koh78]. The machine has four states C and 

D, and one input x. The first column in the table represents the present state, the next 
two columns represent the next states reached up on the application of input x. 

It is clear that the machine has a synchronizing sequence 01010; this sequence, when 
applied to the FSM, synchronizes the machine to state D, regardless of the output or the 
initial state. Consider the property D \= AGEF{C) . This property can now be reduced 
to D \= EF{C) based on verifying that there is a synchronizing sequence to the state 
D (in this example, the sequence 01010 achieves the objective). 

An intuitive explanation of the savings possible from this method is presented below. 
Consider any arbitrary state that is reachable from D, say B for illustration. Transferring 
the machine from state B to state C can be performed in the two distinct steps of transfer- 
ring the machine from state B to the synchronized state D followed by transferring the 
machine from state D to state C. Symbolically, this can be represented by the following: 

B ^ C^(B D-.D^C). 

The key is to note that checking for the validity of a given initializing sequence (for 
example, from designers or from ATPG vectors), incurs only the cost of logic simulation. 
We also note the important difference between the use of a synchronizing sequence and 
a reset sequence that is potentially derived from a reset signal in the circuit. While the 
use of reset signals is equally applicable for our result, it may be entirely uninteresting to 
apply a reduction in the formula based on the use of such signals . However, synchronizing 
sequences (which are more general than reset sequences), when available, can be used to 
simplify the properties as illustrated. Further, checking of multiple properties of the kind 
illustrated can benefit from a single check for the validity of a synchronizing sequence. 



2.3 Reduction of CTL Formulae: Formalization 

Synchronizability can be formally expressed as a CTL formula: AG EF (sq), where 
So is the specific state to which the FSM is synchronizable. Using the above CTL 
formula to represent synchronizability, we state the following result that if there exists 
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a synchronizing sequence for the FSM, then checking for the existence of at least one 
path from the synchronized state on which a property holds eventually is equivalent to 
checking for the property to hold eventually along at least one path from every state. 
Formally, 

Result 1 AG-EF Reduction 

AG EF(s = So) A EF^,(p) ^ AG EF(p) 

where sq is the specific state to which the FSM is synchronizable, p is the predicate to 
be checked. The reduction of the formula into the form EF(p) is critically helpful in 
embedding the predicate p directly into the state justification engine of the sequential 
ATPG algorithm. A brief description of the main phases in a sequential ATPG algorithm 
and the proposed transformation procedure for obtaining a sequential ATPG problem 
are discussed later in the paper. 



2.4 Proof Mechanization of the CTL Formula Reduction 

In this section, we verify and incorporate the model checking reduction results stated in 
Section 2.3 into a mechanical theorem prover PVS [ORR+96]. PVS provides an inte- 
grated environment for the development and analysis of formal specifications and has 
a powerful theorem prover with a high-degree of automation together with a Binary 
Decision Diagram (BDD)-based model checker. The verification of properties is perfor- 
med by invoking appropriate built-in proof strategies. The proof strategies consist of a 
combination of induction, rewriting, and special purpose decision procedures such as 
for linear arithmetic and model checking using BDDs. 

The proof of Result 1 proceeds by first lifting the CTL operator AG to a general 
universal quantification on states in PVS and then expanding the EF operator first into 
a mu-calculus formula which is then expanded into least/greatest fixpoints definitions. 
Using definitions and theorems of least/greatest fixpoints theory the proof is completed. 
The proof is fully automatic using PVS after lifting the AG operator to a general universal 
quantification on states. It takes a few seconds on SPARCstation20 with 32M. 



3 Sequential ATPG Algorithms and State Space Exploration 

The objective of sequential ATPG algorithms is to generate test sequences that detect 
all the detectable stuck-at faults in a sequential circuit [ABF90]. There is a large body 
of literature available in the area of algorithms to solve the sequential ATPG problem. A 
brief summary of the main steps in a typical sequential ATPG algorithm is now presented. 
A detailed discussion of these algorithms is beyond the scope of this paper. 

A stuck-at-0( 1 ) fault refers to the value of a line in the circuit being held to a constant 
0(1) value. A sequential ATPG algorithm attempts to detect every such fault (two faults 
per each line in the sequential circuit) in the circuit. A fault is said to be detected if 
there exists a sequence that produces different responses on the good machine and faulty 
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machines. Varying requirements on the start states sequential circuit are possible (the 
machine may be assumed to start either from a completely unknown state or a from a 
given start state). 

Typical sequential ATPG algorithms achieve this objective of detecting a fault by 
solving three sub-problems, excitation, propagation, and (state) justification. Excitation 
refers to the process of identifying an input vector (on a single time-frame) that can 
either produce a difference between the two circuits at a primary output or a flip-flop. The 
objective of propagation is to then produce a test sequence that can take a difference value 
(good/faulty equal to 1/0 or 0/1) from a flip-flop and propagate it to a primary output. 
( State) justification is the process of taking any requirements at the state lines (flip-flops) 
that were produced at the excitation phase and justify them either to the starting state or 
the all-unknowns state. Efficient methods to solve each of these problems are available 
in literature. Current methods are capable of handling designs with thousands of latches. 

The main benefits of using a sequential ATPG algorithm are that there is no explicit 
storage of states required at each time-frame; time-frame expansion is on-the-fly and is 
restricted only to those parts of the design that truly need to expanded (this is a way to 
perform on-the-fly abstraction). The algorithms overcome the need to store all the states 
at each step of navigation through the state space by using decision trees that keep track 
of variables being assigned at specific stages in the program (for example, a latch may 
be given a value of 1 at a time-frame and if that value assignment results in no solution to 
the problem, a value of 0 is reached by backtracking). Hence, this method of searching 
for a requirement in the state space achieves a balance between a purely breadth-first 
state exploration method (as in conventional model checkers) and a purely depth-first 
exploration method (not efficient to explore large state spaces). The state-tuples that the 
method explores at each time frame are usually decided based on effective heuristics to 
determine easily controllable or observable state elements. 



3.1 Distinguishing Sequence Generation 

Results demonstrating the ability of state-of-the-art sequential ATPG algorithms to ju- 
stify and efficiently handle large state spaces (over 1700 latches) have been reported in 
academic literature. In addition, commercial sequential ATPG tools are frequently ap- 
plied to designs consisting of several thousand latches. We first present the results from 
a state-of-the-art, deterministic, test generation algorithm [NP91] (capable of proving 
the indistinguishability of the faulty machine from the good machine) and then present 
results obtained by using a genetic algorithm-based sequential ATPG [HRP97] that is 
extremely efficient for obtaining distinguishing sequences but is incapable of proving un- 
detectable faults . The fault detection results reported in that work achieved the highest 
known detection coverages at that time. 

These results are presented in Tables 2 and 3. For the data in Table 2, the time limit 
and backtrack limit for each fault were set to be 20 seconds and 100,000 respectively 
in each of the circuits except for s35932. because of the large number of faults in it. 
For this circuit, a two second time limit and a backtrack limit of 10,000 were placed. 
The columns in the table indicate the circuit name, the number of detected faults, the 
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number of faults proven to be undetectable by the test generator, the number of aborted 
faults, the time taken in seconds, and the number of vectors respectively. For the data in 
Table 3, the columns represent the circuit, checkpoint, number of detected faults, number 
of vectors produced and the time taken, respectively. The checkpoints refer to varying 
stages during execution of the genetic algorithm where computation could be stopped. 
The entries in bold represented the highest reported fault coverages at the time. It is clear 
from the data in these two tables that sequential ATPG algorithms have the ability to 
navigate through large state spaces efficiendy to achieve the desired objectives. 



Table 2. Sequential ATPG results 



Circuit 


Detected 


Undetectable 


Aborted 


Time (sec.) 


Vectors 


s298 


265 


26 


17 


389 


306 


s344 


314 


8 


20 


489 


117 


s400 


336 


9 


81 


1888 


1644 


s420 


28 


152 


275 


6235 


16 


s526 


51 


17 


487 


10883 


34 


s641 


404 


61 


2 


89 


219 


s713 


476 


105 


0 


23 


177 


s820 


812 


31 


7 


433 


928 


s832 


816 


46 


8 


500 


967 


s953 


89 


990 


0 


147 


14 


S1238 


1283 


72 


0 


14 


478 


S1423 


555 


11 


949 


20359 


88 


S1488 


1439 


27 


20 


1238 


1124 


sl494 


1439 


27 


20 


1238 


1124 


s5378 


3152 


148 


1303 


27078 


949 


S35932 


34719 


3856 


519 


7172 


317 



4 Model Checking Using Sequential ATPG 

The transformation of model checking to stuck-at-fault detection can be performed 
based on an automata-theoretic approach as illustrated in Figure 1. Given a temporal 
logic formula, the transformation constructs monitor automata and a test network reali- 
zing a function that evaluates to “1” iff the monitor automaton/automata reaches a bad 
state/states. After generating the network, a sequential ATPG algorithm can be invoked on 
the new circuit with the stuck-at fault to be tested as its objective. Note that transforming 
model checking to stuck-at-fault detection in this manner may not be the most efficient. 
It is usually more efficient to build the model-checking objectives such as checking for 
the reachability of a bad state in the monitor automaton into the implementation of the 
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Table 3. Sequential ATPG results 



Circuit 


Ckpt 


Det 


Vec Time 


Circuit 


Ckpt 


Det 


Vec 


Time 


s382 


1 


361 


601 1.07 min 


S1423 


1 


1410 


2065 13.2 min 




2 


362 


1285 5.9 min 




2 


1410 


2965 40.1 min 




3 


364 


1486 8.1 min 




3 


1414 


3943 


1.27 hr 


s444 


1 


408 


354 38.5 sec 


sl494 


1 


1393 


295 5.34 min 




2 


420 


753 2.3 min 




2 


1453 


540 7.50 min 




3 


424 


1945 20.1 min 




3 


1453 


540 7.60 min 


s526 


1 


431 


486 1.37 sec 


s5378 


1 


3562 


2175 


4.60 hr 




2 


442 


1098 8.3 min 




2 


3607 


4461 


25.1 hr 




3 


454 


2642 54.5 min 




3 


3639 


11571 


37.8 hr 


s713 


1 


475 


157 1.1 min 


S35932 


1 


35100 


257 


2.1 hr 




2 


476 


176 1.30 min 




2 


35100 


257 


10.2 hr 




3 


476 


176 1.31 min 




3 


35100 


257 


10.9 hr 


s820 


1 


812 


572 3.07 min 


am2910 


1 


2190 


953 6.25 min 




2 


814 


590 3.60 min 




2 


2197 


1761 13.5 min 




3 


814 


590 3.63 min 




3 


2198 


2509 29.4 min 


si 196 


1 


1235 


521 1.12 min 


divl6 


1 


1727 


352 32.0 min 




2 


1237 


536 1.21 min 




2 


1810 


1168 


2.62 hr 




3 


1239 


574 1.49 min 




3 


1814 


3476 


8.1 hr 



ATPG algorithm as state justification objectives. We note again that reducing formulae 
to forms that permit passing of objectives directly to the state-justification engine is 
critically helpful in the efficiency of this procedure. 

Any property for which a monitor automaton can be constructed to result in a test 
network of manageable size can be checked by such a transformation. Intuitively, this 
approach seems ideally suited for checking safety properties i.e., those properties every 
violation of which occurs after a finite execution of the system. A theoretical characte- 
rization of the exact class of properties that can be transformed effectively into sequential 
ATPG problems was not attempted in this paper. Our paper is targeted at studying the 
efficiency of sequential ATPG algorithms for state space exploration. Specifically, we 
have restricted the properties to be of the form EF F. The general reduction approach 
would be based on techniques for constructing monitor automata for more general pro- 
perties [Wol82,FTMo83,FTMo85,NFKT87] and may be able to exploit recent results 
on constructing smaller automata based on a classification of safety properties [KV99]. 



4.1 Example 



The transformation of a property of the form EF F, where F is a. conjunction of value 
assignments to some signals in the circuit is shown in Figure 2. The property checked 
in the example is EF (yl = 1 AND y2 = 1). In the example, an AND gate tying the 
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INPUT 




Fig. 1. Transformation of model checking to sequential ATPG: resulting circuit structure 



signals yl and y2 is added to the original circuit under verification. Note that no monitor 
automata are needed to be constructed for this example. 



4.2 Three- Valued Testability and Overspecification 



It is important to note that various definitions of untestability have been discussed in lite- 
rature [PR92,PR93,CM93,Bop97]. A detailed discussion of these definitions is beyond 
the scope of this paper. However, two of the most important issues involved in the 
definition of untestability are briefly discussed. 

First, we consider the notion of three-valued testability. A fault is three- valued testa- 
ble iff there exists a test sequence that can produce a difference (0/1 or 1/0) at a primary 
output when the good and the faulty machines are started from the all-unknowns (all Xs) 
states and three- valued logic simulation is used to evaluate the output responses. This is 
the notion of untestability used by most practical gate- level ATPG algorithms operating 
with three- valued logic. The set of three- valued testable faults was shown to be a subset 
of all testable faults [PR92,PR93].. 

Secondly, we consider the problem of overspecification [CM92,CM93] present in 
some sequential ATPG algorithms. The problem occurs because most gate-level test 
generation algorithms for sequential circuits are based on the use of the time frame 
expansion technique [ABF90] and the use of combinational test generation algorithms 
such as PODEM [Goe90] within each time frame. Some underlying combinational test 
generation algorithms, unfortunately, may overspecify the requirements at present state 
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Original circuit 




Fig. 2. Transformation of model checking to sequential ATPG: example 



lines while processing a time frame (PODEM, for instance, may overspecify the requi- 
rements). This, of course, does not create a problem for combinational circuits, because 
overspecifying primary inputs does not affect the applicability of a test vector to the 
circuit. However, for sequential circuits, whenever this occurs, the objectives on the pre- 
vious time frame may be more specified than necessary and may result in an incorrect 
claim by the test generation algorithm regarding the three-valued testability of the fault. 

While the loss of accuracy caused by the use of three-valued logic and overspecifi- 
cation have not been much of a concern to the test generation problem itself (because 
it was shown that they cause only a small loss of fault coverage in the test generation 
process), it has potentially serious implications as far as using some ATPG algorithms 
for verification is concerned. Untestability characterization and several techniques for 
improving the accuracy of the test generation process (for example, based on verify- 
ing the existence of initializing sequences) have been presented earlier [PR93,CM93, 
Bop97]. The design verification application must be carefully analyzed before choosing 
the appropriate sequential ATPG algorithm. 



5 Experimental Results 



Results on state justification experiments for some hard-to-test ISCAS circuits 



Four circuits have been chosen for our experiments on property checking because of 
the difficulty posed by them to sequential ATPG algorithms. Each of these circuits has 
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Table 4. Number of properties successfully checked using VIS and ATPG 



Circuit 


Number of signal assignments per property 
(5 properties for each case) 


2 


3 


5 


VIS 


ATPG 


VIS 


ATPG 


VIS 


ATPG 


s526 


5 


5 


5 


4 


5 


5 (4+1) 


S1423 


0 


4 


0 


0 


0 


0 


s5378 


0 


5 (2+3) 


0 


4 


0 


0 


S35932 


0 


4 


0 


1 


0 


1 



been checked to be initializable using ATPG-generated test sequences. The sequential 
ATPG algorithm being used in our experiments is HITEC [NP91]. Our experiments on 
the ISC AS circuits were run on a SPARC station 20 with 64MB of memory. A time limit 
of 20 seconds and a backtrack limit of 100,000 where set in the ATPG algorithm for 
each of the formulae checked. 

Table 4 shows the experimental results comparing the performance of the ATPG- 
based approach with VIS [Gro96]. For each circuit, fifteen properties of the form AG 
EF( ) were generated and verified against the circuit. For each formula, Pi was 
generated by choosing a specific number of internal signals (five properties generated 
for each case with two, three and five signals), specifying Boolean values for them 
randomly, and ANDing them together. For example, a case with two signal assignments 
could consist of a Pi with (a=l AND b=0). The table lists the number of cases out of the 
five chosen cases that the formula was successfully proven/disproved. For entries where 
numbers are provided in parantheses, the first number in the paranthesis indicates the 
number of cases for which vectors were obtained and the second number indicates the 
number of cases for which no test sequence was available (proven untestable). As can 
be clearly seen from the Table, the ATPG-based approach is capable of providing results 
for certain formulae even for very large circuits. 

It is also interesting to note the differences in the state space exploration strategy in 
the two approaches. Results are shown on the small circuit s526 for which VIS could 
successfully complete the model-checking experiment to compare it with the state space 
exploration strategy in the ATPG-based approach. These results are presented in Table 5. 
Five signals were chosen from the circuit, random Boolean values were assigned to 
these signals and they were tied together by an AND as before. The numbers of vectors 
produced to achieve the required assignment of internal signals and the times required for 
generating these sequences are shown. The number of vectors produced by the ATPG- 
based approach indicates the performance of the structural search (somewhere between 
a DFS and a BFS) as opposed to the BFS-like search (for these types of formulae) 
involved in VIS. We emphasize again, of course, that the ATPG-based approach does not 
need to build the state transition relation and extensive functional representations and 
hence is memory efficient. Even the largest benchmark circuit tried (with 1728 flip-flops) 
required less than 20MB of memory. 
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Table 5. Differences in state space exploration strategy 



Circuit 


VIS 


ATPG 




Vectors 


Time (sec.) 


Vectors 


Time (sec.) 


s526 


21 


0.9 


132 


880.8 


s526 


48 


1.8 


240 


41.4 


s526 


I 


0.6 


3 


0.03 


s526 


8 


0.8 


50 


3.6 


s526 


I 


0.8 


3 


0.02 



Results on property checking experiments on an industrial circuit 

Experiments were also performed on a large industrial circuit to verify the effec- 
tiveness of the proposed sequential ATPG-based property checking system. The design 
used was that of an 10 controller consisting of five modules: ADDRESS_DECODER, 
OUT.CONTROL, READ.CONTROL, IRQ.CONTROL and REG.BANK. The circuit 
consisted of 148 flip-flops, 51 primary inputs, 51 primary outputs and 1753 basic cells. 

Experiments were performed to identify load sequences for obtaining specific values 
at registers embedded deep in the design. The ATPG approach was compared against 
a state-of-the-art, model checking tool BINGO [INH96,IN97]. The ATPG approach 
was successful in obtaining a sequence for every register tried while BINGO could not 
produce any sequence more than 6 vectors long. BINGO was terminated in each of these 
cases because the memory requirement exceeded 500 MB. The ATPG approach required 
no more than 20 MB for each of the cases and produced vector sequences of length upto 
22 . 

6 Conclusions 

In this paper we have given an efficient method based on stuck-at-fault testing tech- 
niques for automatic verification for a useful subclass of properties of synchronizable 
ESMs, typical of hardware designs. We have presented reduction of CTL formulae of 
the form AG EF F to EF F based on the existence of synchronization sequences and 
proven the reductions in the PVS proof-checker. Model checking the reduced formulae 
is transformed to stuck-at-fault testing and solved by sequential ATPG. 

We have shown that the method has the potential to scale up to large hardware designs 
for which current model checking methods fail. The reason our method scales up is 
because it does not involve extracting and computing expensive functional information 
such as the complete state-transition relation. Instead, the approach relies on efficient 
fault-testing that exploits the circuit structure of the hardware design to be verified. As 
part of future work, we plan to characterize and experiment with more general properties 
that can be reduced to the stuck-at-fault testing problem and to investigate methods to 
incorporate the advantages of BDD-based model checking and the ATPG-based approach 
into a unified framework. 
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Abstract. Abstract state machines (ASMs) provide the basis of a suc- 
cessful methodology for specification and verification of software and 
hardware systems. Nevertheless, computer aided verification of ASM- 
programs has not yet been well-developed. In this paper we try to shed 
some light on the limits of automatic verifiability of ASM-programs. 

We introduce a class of restricted ASM-programs, which are called 
nullary programs, and provide an algorithm that decides whether a given 
nullary program satisfies a given correctness property (expressible in a 
GTL^-like temporal logic) on all inputs. Our decision algorithm runs in 
PSPACE and we show that this is optimal. We also show that straight- 
forward generalizations of nullary programs cannot be verified algorith- 
mically, as some basic verification problems become undecidable. 

1 Introduction 

Abstract state machines (ASMs) [Gur95,Gur97], formerly known as evolving al- 
gebras^ provide the formal foundation of a method to design and analyze complex 
hardware and software systems. When designing such a system one usually starts 
with a high-level description of the system and, by stepwise refining intermediate 
stages, eventually obtains a low-level description which is close to executable 
code. The ASM-rnethod proposes to describe each stage of the refinement pro- 
cess in terms of ASM-programs. (That ASM-programs really suffice to express 
all levels of abstraction of a dynamic system is witnessed by many large-scale ap- 
plications of the ASM-method [BH98].) The advantage of this approach is that 
ASM-programs are close to logic (see Theorem 5 in Section 3), which makes 
them easily accessible for well-understood mathematical methods. Essentially 
this mathematical foundation of ASM-programs supports the formal verification 
of systems designed by means of the ASM-method. For an introduction to the 
ASM-method the reader is referred to [Bor95]. Although there do exist nume- 
rous verification examples in the ASM-literature [BH98], one can hardly find an 
example where all or part of the verification process is mechanized. That is, com- 
puter aided verification of ASM-programs has not yet been well-developed. In 
this paper we investigate the problem of verifying ASM-programs automatically. 

In its full generality, automatic verification of programs (not necessarily in 
ASM-syntax) is the following decision problem. Given a program II and a cor- 
rectness property p (expressed in some appropriate specification formalism), 
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decide whether for every input / the computation of 77 on / satisfies (/?. Ob- 
viously, decidability of this problem crucially depends on the expressiveness of 
the programming language and the specification formalism one has in mind. 

Here, we present a class of restricted ASM-programs and a specification for- 
malism resembling the branching-time logic CTL* [CES86,Eme90] for which the 
above decision problem is decidable, i.e., which can be verified automatically. 
We call our programs nullary programs because the main restriction we impose 
on ASM-programs is that every dynamic function must have arity 0. (Roughly 
speaking, a nullary dynamic function v is nothing but a program variable in the 
usual sense. During a computation step the value of i;, i.e., the interpretation of 
the function symbol i;, may change. This corresponds to assigning a new value to 
the ‘program variable’ v.) As a possible field of application for nullary programs 
we suggest the high-level ASM-descriptions that naturally occur when designing 
a complex dynamic system via the ASM-method. The decision algorithm we 
provide can then be used to verify such high-level ASM-descriptions. 

Aside from possible applications we think that the technique underlying our 
decision algorithm is also of independent interest, as in some sense our algorithm 
performs symbolic model checking of software. By software we here mean pro- 
grams that get a priori unbounded input and whose computations depend on a 
‘non-triviah part of the input. Nullary programs are software in this sense. Eor 
example, one can write a nullary program i/p, that solves the reachability pro- 
blem for all finite graphs. Given an arbitrary finite graph with two distinguished 
nodes source and target as input, TJr decides whether target is reachable from 
source (see Example 3 in Section 3). 77 r also indicates that nullary programs go 
beyond the scope of finite state systems. One can hardly imagine a finite state 
system which ‘faithfully’ represents all computations of a reachability algorithm 
on all possible input graphs. 

To make our verification technique more precise let us reconsider model 
checking. Can we use model checking for automatic verification of programs 
(again, not necessarily in ASM-syntax)? That is, is it possible to model-check 
whether a given program 77 satisfies a given correctness property p for all possi- 
ble inputs? The answer is yes if there are only finitely many inputs to be checked 
and the space (or time) complexity of 77 is bounded by some function in the 
size of the input. Eor instance, repeat the following steps for each input 7. Eirst 
run 77 on 7 and this way obtain the computation graph of 77 on 7, i.e., the 
graph whose nodes are the reachable configurations of 77 on 7 and whose edges 
represent transitions from one configuration to a successor configuration. This 
graph is clearly finite and can be viewed as a Kripke structure whose labels are 
complete descriptions of configurations of 77. Using standard techniques one can 
(model-) check whether 77 satisfies p on I . Since there are only finitely many 
inputs we can indeed decide whether 77 satisfies p on every input. Erom the 
theoretical point of view there is no principle difference between finite states 
systems and resource-bounded programs running on a finite number of inputs. 

‘Real’ programs, however, are supposed to be correct for infinitely many 
inputs, e.g., for all finite graphs. In this case a naive application of model checking 
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fails simply because one cannot construct for all inputs the corresponding com- 
putation graphs. The main idea in this paper is to avoid an explicit construction 
of the computation graphs by translating a given program 77 into a logical for- 
mula which can be seen as a symbolic representation of all computation graphs of 
77 (independent of a particular input). Combining this formula with the correc- 
tness property (p to be checked, one can reduce the problem of (model-) checking 
whether 77 satisfies p on all inputs to the problem of deciding finite validity of 
a logical formula. 

We demonstrate the new technique for nullary programs and correctness 
properties definable in a specification logic called CGL* - a straightforward ad- 
aption of CTL* for reasoning about computation graphs. It turns out that the 
one-step semantics of a given nullary program 77 can be expressed in terms of 
an existential first-order formula. Employing a translation of CTL* into transi- 
tive closure logic (FO+TC) by Immerman and Vardi [IV97], one can combine 
this existential formula with an arbitrary CGL*-formula p so that the resulting 
(FO+TC) -formula is finitely valid iff all computation graphs of 77 satisfy p. The 
latter means that 77 satisfies p on all inputs. We then observe that finite va- 
lidity (resp. finite satisfiability) of the obtained (FO+TC)-formula is decidable 
in PSPACE if 77 takes relational input and p is an existential (resp. universal) 
C CL* -formula. Hence, in order to decide whether 77 satisfies p on all inputs 
our algorithm first turns the instance (77, p) of the verification problem into a 
(FO+TC)-formula and then decides finite validity of this formula. 

After showing this positive result about nullary programs with relational 
input we prove that for nullary programs with functions in their input most basic 
verification problems (like reachability of a safe state and being constantly in safe 
states) become undecidable. This even holds for very simple nullary programs. 
Also, the situation does not change when we restrict attention to relational 
input and instead increase the computational power of nullary programs (e.g., 
by allowing first-order quantifiers in guards or dynamic functions of arity > 0). 

2 Preliminaries 

A vocabulary is a set T of relation and function symbols each associated with an 
arity. Nullary function symbols are usually referred to as constant symbols. All 
vocabularies we consider here are finite and contain at least the two constant 
symbols 0 and 1 (which we usually do not include explicitly). A T -structure A 
consists of a set A, called the universe of A, an interpretation C for each 
7- ary relation symbol 77 G T, and an interpretation for each 7- ary 

function symbol f E T. We will always assume that 0"^ A Fin(T) denotes 
the set of all finite T-structures. 

A k-ary query on Fin(T) is a mapping Q that assigns to every A G Fin(T) 
a A:- ary relation C A^ such that the following holds: every isomorphism 
between A and A, G Fin(T), is also an isomorphism between (A,Q"^) and 
(5,Q^). In the special case A: = 0 we call Q a boolean query and view Q as a 
subset of Fin(T) closed under isomorphism. As an example, recall that every 
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first-order formula Lp[x\^... ^Xk) over T (where all free variables of ip occur 
among Xi,... ,x/^) defines a A:-ary query on Fin(T) mapping A G Fin(T) to 
:= {(ai, ... ,au) e ■. Lp[ai, . . . ,ak\}- 

Transitive closure logic^ (FO+TC), is the closure of first-order logic under 
the transitive closure operator TC. More formally, (FO-hTC)(T) is the set of all 
T-formulas derivable from the usual formula- formation rules of first-order logic 
(with equality) and the following rule. 

(TC) If (/p is a formula, x and x^ are two A:-tuples of variables, and t and T are 
two A:-tuples of terms, then c/p](t, F) is a formula. 

The meaning of (p]{t^T) is as follows. Regard (p] as a new 2A:-ary 

relation symbol whose interpretation is the transitive, reflexive closure of the 
image of the 2A:-ary query defined by c/p(x,x^). If, e.g., p[x^y) := Exy V Eyx 
and C = (y, is a directed graph with two distinguished nodes s and t, then 
(C,s,t) 1= [TC^ y p][sA) iff there is an undirected path in G connecting s and 
t. For a formal definition of the semantics of (FO+TC) see, e.g., [EF95]. 

The existential fragment o/(FO+TC), (E+TC), is the set of all (FO+TC)- 
formulas without occurrence of a universal quantifier and where all negated 
subformulas are quantifier-free. 

Deciding finite validity and finite satisfiability of (E+TC) -formulas over a 
fixed vocabulary will be of particular interest for us. For every vocabulary T, 
FiNVALp(E+TC) (resp. FiNSATp(E+TC)) is the following decision problem. Gi- 
ven a sentence p g(E+TC)(T), decide whether A^ p for every A G Fin(T) 
(resp. for some A G Fin(T)). 

Theorem 1. LetT he a vocabulary that contains relation and constant symbols 
only. Then both FiNVALp(E+TC) and FiNSATp(E+TC) axe PsPACE-comj?/ete. 

3 Nullary Programs 

In this section we introduce nullary programs. Basically, a nullary program is 
a nondeterministic basic ASM-program (in the sense of [Gur97]) where every 
dynamic function (i.e., a function that can be redefined during a computation) 
is nullary. We show that nullary programs have the same expressive power as the 
logic (E+TC). An immediate consequence is that on ordered input structures 
they compute exactly all NLogspace computable functions. 

Let T be a finite vocabulary. A program vocabulary that extends T, denoted 
Tp, is obtained from T by adding some new constant symbols pi, . . . ^Vk to T. 
Nullary programs over Tp (which we will define below) take finite T-structures as 
input; we frequently refer to T as the input vocabulary. Each Vi will play the role 
of a program variable. The value, i.e., the interpretation, of Vi may change during 
a computation step of a nullary program. We call Vi a dynamic (abbreviating 
the official ASM-term “nullary dynamic function symbol”). 

Definition 2. Let Tp = Tu{pi,... ,p/^} be a program vocabulary. Nullary 
programs over Tp are defined inductively: 
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1. Updat e: For every dynamic Vi and every Tp-term t the assignment Vi := t 

is a nullary program. 

2. Conditional: If c/p is a quantifier- free Tp-formula and II a nullary program, 

then (if ip then II) is a nullary program (with guard (p), 

3. Psirallel execution: If IIq and 7Ti are nullary programs, then 77o||iTi is a 

nullary program. (For readability, iJo||iJi is sometimes written as 

4. Choice: Let z be a tuple of variables, p a quantifier- free Tp-formula, and 

n a nullary program. If 3zp is finitely valid, i.e., if it holds in all finite Tp- 
structures for all interpretations of the free variables of 3zp^ then (choose z : 
p n) is a nullary program (with guard p ) . 

(Intuitively, the semantics of (choose z : p 77) is as follows. Choose nondeter- 
ministically values for the variables in ^ so that the guard p is satisfied. Finite 
validity of 3zp guarantees the existence of such values. The actual program to 
be executed is then obtained from 77 by replacing every occurrence of Zi in 77 
with the value chosen for zp Note that in many cases p := true suffices as guard. 
However, if for your favorite guard cp, 3zp is not finitely valid, you can often 
replace p with p^ := pV z = 0 and sort out ‘invalid’ choices of 0 inside 77.) 

A nullary program is deterministic if it is derivable from the above rules 
without using the choice rule. □ 

The free and bound variables of a nullary program are defined in the obvious 
way. For instance, in the nullary program (choose z : p II) each variable in 
z occurs bounded. We can restrict attention to nullary programs without free 
variables if we substitute every free variable by a new constant symbol. For 
simplicity we will do so from now on. 

The semantics of ASM-programs is usually given by means of update sets 
[Gur97,Gur95]. We define the semantics of nullary programs in a different way, 
which will be more convenient for our purposes. Nevertheless, our semantics 
coincide with the standard semantics. 

Semantics of NuIIary Programs. Gonsider a nullary program 77 over Tp, 
Tp = T U {t?!, ... 77 takes finite T-structures as input A state of 77 on 

an input A G Fin(T) is a finite Tp-structure (Al, ai, . . . ,a/^), where ai, ... ,a/^ 
are the interpretations (or values) of the dynamics respectively. The 

initial state of 77 on Al is (Al, 0, ... ,0). As with a Turing machine program, the 
‘program text’ 77 is viewed as a description of how to modify the current state 
of 77 in order to obtain a possible successor state. Formally, 77 induces a 27-ary 
relation 11^ C x so that (a,a^) G II^ means that if 77 is currently in 
state (Al, d) then in the next step it may change to state (Al, d^). 

To the definition of 77"^. By induction on the construction of 77, we define 
an existential T-formula pn{^A^): of whose free variables occur among x — 

Xi, . . . and x^ = x[^ . . . For a better understanding assume that the 
interpretation of Xi (resp. x() equals the current value of Vi (resp. the value 
of Vi in a successor state). We write [v/x] to denote the substitution of every 
occurrence of Vi by xp 
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If n = Vi := t let (fn '•= = t[v jx]. 

If 77 = if (/p then 77o let (pn -= p[v/x] pUo^ 

If 7r = 77o||77i let Pn •= 

If 77 = choose z : p Uq let pn := ^z{ip[v /x] A pno)- 

Consider two states (^, a) and of 77. Intuitively, A |= pn[^A^] means 

that, if 77 is currently in state (Al, d) and {vi := t) is an update in 77 (possibly 
occurring in the scope of guards all of which are satisfied in (Al, d)), then a( = 

tiApa) 

is the new value of Vi in successor state (^, a^). pn describes all those 
updates which must be performed in the next step. 

pn is not yet the desired definition of 77"^. This is because pn does not 
say that dynamics not effected by any update must not change - which is the 
intended meaning of 77. (Suppose, e.g., that Vi does not occur in 77. Then A |= 
pn[(^j(t!] may hold even when A We fix this as follows. For every F C 
= xi,... ,x^ = X}F\ let pn,r{^A^) •= Fn F /\F (where, by convention, 
f\0 = true). Call F maximal w.r.t. to state (Al, d) if Al |= 3xVi7,r[d,x^] and 
there is no 7^*, 7^ C 7^*, such that A |= 3xVi7,r* [d,x^]. Finally, let (d,d^) G 77"^ 
iff either 

— there exists a F maximal w.r.t. (^, d) such that A \= V^i 7 ,r[d, d^], or 

— a = a^ and A A pn[t^A']- 

In the latter case we say that FI is inconsistent in state (Al, d). If (d,d^) G 77"^ 
then (^, d^) is called a successor state of (^, d). Notice that every state has at 
least one successor state. If Ft is deterministic then every state has a unique 
successor state. 

A run of Ft on A is an infinite sequence of states such that the first state in 
the sequence is the initial state of 77 on ^ and the (i + 1)^^ state is a successor 
of the state. Every run of 77 on Al can be embedded in the com^putaMon graph 
of Ft on Al, denoted Cn{A)^ which is the finite graph (N, 77, sq) consisting of 

— state set S := {{Ay a) : d G A^}, 

— reachability relation R := {((A, d), (A, d^)) : (d,d^) G 77 and 

— initial state sq := (A, 0). 

Assume that Tp contains the distinguished dynamic accept. We say that Ft 
accepts A if in Cn{A) there exists a path from sq to a state where the value of 
the dynamic accept is 1. 77 computes a boolean query Q C Fin(T) if for every 
A G Fin(T), Ft accepts A iff A G Q. 

Example 3. Consider the following decision problem known as reachability. 
Given a finite directed graph G = (F, 77) and two nodes s and t in G, decide 
whether there exists a path from source s to target t m G. reachability can 
be seen as a boolean query on finite structures of the from (G, s,t). We present 
a nullary program 77h, that computes this boolean query. 

The input vocabulary of 77h, is T := {77,s,t}, where E denotes the binary 
edge relation of the input graph and s and t the source and the target, res- 
pectively. (Recall that by our general assumption we also have 0, 1 G T.) 77p 
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as defined below is a nullary program over the program vocabulary Tp := 
T U {mode ^ pebble^ accept}] it employs the three dynamics rnode^ pebble^ and 
accept. (For readability we use a slightly relaxed syntax and omit parentheses.) 

Ur := if mode = 0 then 
pebble := s 
mode := 1 

if mode = 1 then 
if pebble ^ t then 
choose z : true 

if E{pebble^ z) then pebble := z 

else 

accept := 1 

On an input A = (G, s,t), the states of Ur are Tp-structures of the form 
(Al, Up, Ua), where a^^o^p^ and are the values of mode^ pebble^ and accept^ 

respectively. Initially, Ur is in state (Al, 0,0,0). In the first step, Ur moves to 
state (Al, 1, s,0). Then, as long as the value of pebble does not equal t, Ur choo- 
ses a node a in (A, checks whether [pebble^ a) is an edge in (A, and updates pebble 
with a if so; otherwise it performs no update. If pebble is ever updated with t, 
Ur accepts by updating accept with 1. In this case Ur becomes idle; it repeats 
the accepting state infinitely often. □ 

Lemma 4. For every nullary program Ft over Tp^ Tp = T U {ui, ... ^Vk}; there 
is an existential first-order formula Xni^A^) over T with 2k free variables 
such that for every A E Fin(T) and allaA^ G Al |= Xn[hA'] ^ (d,a^) G Ff'^. 
Xn can be obtained from Ft in time polynomial in the size of Ft . 

One can view xn in the previous lemma as a symbolic representation of the 
reachability relations of all possible computation graphs of Ft (independent of a 
specific input). In fact, for every input Al, there exists a path from sq to (Al, a) 
in Cn{A) iff .A 1= Xn{x,r)][^,a]- 

Theorem 5. A boolean query Q is computable by a nullary program i Q is 
definable in the logic (E+TC). 

Proof. (Sketch.) Suppose Ft computes Q. Let Xn{^A') obtained from Ft ac- 
cording to Lemma 4, where Xi (resp. x() represent the value of accept. Then 
3F{[TCx^x' Xn{^A^)]{^A^) A = 1) defines Q. For the other direction as- 
sume that the sentence p G (E+TC)(T) defines Q. There exists a quantifier- free 
formula such that p is equivalent to [TC^^^/ t/;(x, x^)](0, 1) (see, e.g., 

[GM96]). Redefine Ur in Example 3 by replacing pebble^ z, s, t, and E[pebble^ z) 
with p, z, 0, I, and fi{p^ z), respectively, where p is now a sequence of dynamics. 
The obtained program is a nullary program over T 0 {mode,p, accept} and com- 
putes Q. □ 

Immerman [Imm87] showed that on ordered structures a boolean query Q 
is NLogspace computable iff Q is definable in (E+TC). This gives us the first 
part of the next corollary. The second part follows from a result in [GS99]. 
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Corollsiry 6. Let Q he a boolean query on ordered structures. (1) Q is compu- 
table by a nullary program i Q is NLogspace computable. (2) Q is computable 
by a deterministic nullary program i Q is Logspace computable. 

4 Verifying Nullary Programs 

Verification of nullary programs only makes sense in the context of a specification 
formalism suitable to express correctness properties of nullary programs. Since 
all runs of a nullary program II on an input A are embedded in Cn[A) it is 
reasonable to express correctness properties of nullary programs as properties of 
their computation graphs. Below we present a straightforward adaption of the 
branching-time logic CTL* [CES86,Eme90] to the computation graph setting. 
The new logic is called CGL* [computation graph logic ^star^)^ alluding to CTL*. 

Definition 7. Let Tp be a program vocabulary. State formulas over Tp and 
path formulas over Tp are defined by simultaneous induction: 

(51) Every sentence in (E+TC)(Tp) is a state formula. 

(52) If q; is a path formula, then Eo; is a state formula. 

(PI) Every state formula is also a path formula. 

(P2) If a and fl are path formulas, then so are o; V /?, a A /?, and ->a. 

(P3) If a and fl are path formulas, then so are Xo, oU/?, and oB/?. 

An existential state formula is a state formula which can derived from the above 
rules without using in rule (P2) the clause to form negated formulas. 

CGL*(Tp) (resp. ECGL*(Tp)) is the set of all state formulas (resp. existential 
state formula) over Tp. □ 

The intuitive meaning of the existential path quantifier E and the temporal 
operators X and U is as in CTL*. o;B/? stands for “o; holds before fl fails” [IV97]. 
A formal definition of the semantics of CGL* follows. Let C = [S^R^sq) be the 
computation graph of some nullary program over Tp. A run m C is a mapping 
p from the natural numbers to S such that [p{i),p{i + 1)) G for all i. Let p\i 
denote the run p^ defined by p\j) := p{i T ])- Consider a state formula p and a 
path formula o;, both over Tp. Similar to CTL* one defines (C, Al) |= p for every 
state ^ G S and p) |= a for every run p m C hj simultaneous induction on 
the construction of p and a. The only new cases are 

(SI) (C, Al) 1= p A\= p 

(P3) [C,p) 1= aB/? :o Vi((C', /j|i) |= ^ 3j(j < i A (C, /j|j) |= a)). 

Eor every p G CGL*(Tp) let C \= p iff (C, sq) \= p^ 

To give an example of a meaningful CGL*-formula, let us express correctness 
of the nullary program Up in Example 3 in terms of CGL*. More precisely, 
we will display a state formula pp over the program vocabulary of TTp, such 
that Up is correct (i.e.. Up computes the boolean query reaghability) iff 
1= pp for every input A. The following definition of pp is justified by 
two observations: (1) Up is correct iff for every input M, M G REAGHABILITY iff 
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Cn^{A) HEF( accept = 1) (where F/? := trueJJf]). (2) A e REACHABILITY iff 

A \= E{x,x')]{s,t) iff Cn^{A) \= E([TC^_^. E{x,x')]{s,t))- 

:= E([TC^_^. E{x,x')]{s,t)) ^ BF{accept = 1). 

Hence, one can prove correctness of by verifying Cu^{A) \= pR for every 
input A. 

Verifying Nullary Programs w.r.t. CGL*-Properties. Let L be a sub logic 
of CGL*. Verifying nullary programs w.r.t, L means solving the decision problem: 

verify(L): Given a nullary program II and a state formula cp G L, both over 
the same program vocabulary Tp (that extends some input vocabulary T), 
does Cjj{A) \= p hold for every Al G Fin(T)? 

Let VERlEYp(L) denote the corresponding problem where the input vocabulary 
T is a priori fixed (the program vocabulary Tp, however, may still vary). 

The complexity of the latter problem is more significant for applications than 
that of VERIFY (L). For instance, assume that in order to solve a computational 
problem a nullary program II was put forward which happens not to satisfy 
some correctness property cp G L. In that case, one usually has to rewrite 77 
(and possibly modify some correctness properties), rather than changing the 
computational problem itself (and thus the input vocabulary T). 

Notice that deciding VERiFYp(GGL*) subsumes symbolic model checking 
of GTL* -properties. Every Kripke structure 1C (given symbolically in terms of 
boolean formulas) and every GTL*-formula p (appropriate for JC) can easily be 
turned into a nullary program 77^ and a GGL* -formula pp such that /C |= p iff 
(77k;, Ap) € verifY|o,i}(GGL*). 

Recall that EGGL* denotes the existential fragment of GGL* and let AGGL* 
be the set of all negated EGGL* -formulas. Our main positive result is: 

Theorem 8. LetT he a vocabulary that contains relation and constant symbols 
only. Then both verify r{VCQL*) and verify r{ACQL*) are F space - complete. 
In other words ^ given a nullary program 77 and a correctness property p G 
EGGL*^ both over the same program vocabulary that extends the fixed T ^ deciding 
whether 77 satisfies p (or for all inputs is a F SPACE -complete problem. 

The restriction to relational input vocabularies in the theorem is essential. 
In the next section we will see that neither of the two verification problems is 
decidable if the input vocabulary contains a unary function symbol. 

Proof. (Sketch.) PsPACE-hardness of both problems is shown via a reduction 
from the satisfiability problem for quantified boolean formulas. To prove contain- 
ment we reduce VERiFYr (EGGL*) to finvalt (E+TG) and VERiFYr(AGGL*) 
to finsatx(EtTG). The assertion is then implied by Theorem 1. Most of the 
reduction work has already been done by Immerman and Vardi ([IV97], Theo- 
rem 9) who defined a translation of GTL* into (FO+TG). If we replace in this 
translation 77(p,p^) (the reachability relation of a given Kripke structure) with 
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Xn{yjV^) (the ‘reachability relation’ induced by II according to Lemma 4) and 
replace every variable y (representing a state of the Kripke structure) with a 
tuple y of variables (representing the dynamic part of a state of 77), then we 
immediately obtain: 

Fact 9 ([IV97]). For every nullary program 77 and every p G ECGL*^ both 
over same program vocabulary Tp^ Tp = T (j {vi ^ ... there exists a formula 

Xn,(f{y) € (E+TC)(T) such that for every A G Ein(T) and all a ^ ^ A 

Xn,p[o] i {Cn{A), {AA)) h 

It follows that (77, (/p) G VERiFYr (ECGL*) iff Xi7,^(0) G finvalt(E+TG) 
and that {IIpp) ^ VERiFYp (AGGL*) iff Xn,^p{^) € FiNSATp (E+TG). One can 
modify the translation by Immerman and Vardi (by introducing new variables) 
so that it becomes polynomial-time computable. □ 

The space complexity of VFRiFYp(EGGL*) and VERiFYp(AGGL*) grows ex- 
ponentially in the sum of the arities of relation symbols in T. In particular, 
verify(EGGL*) and verify(AGGL*) are in EXPspace for (non-fixed) relatio- 
nal input vocabularies with constants. As already pointed out, this complexity 
bound is more of theoretical interest since for most applications the number of 
input relations as well as their arities will be fixed. 

Although EGGL* and AGGL* are only small fragments of GGL*, they still 
suffice to express many useful correctness properties. Eor example, for every 
linear-time formula a (i.e, a path- formula without path-quantifiers) we have 
Eq; G EGGL* and Ao; G AGGL*. Especially common fairness properties like 
“impartiality”, “weak fairness”, and “strong fairness” can be expressed in these 
fragments (see, e.g., [EL87] and references there). Observe though that the for- 
mula (fp expressing correctness of 77 r in Example 3 is neither in EGGL* nor 
in AGGL*. Nevertheless, there are formulas definable in AGGL* which imply 
partial correctness of 77 r. 

5 On Input with Functions 

A minimal requirement on any automatic verifier for nullary programs is that, 
when given a nullary program 77, it should be able to decide whether 77 reaches 
only ^safe^ states on every input, or, equally desirable, whether 77 can reach a 
^safe^ state on every input. Here, safety for a state could mean that a designated 
dynamic in 77 does or does not assume a particular value. This motivates the 
definition of two simple verification problems which any automatic verifier for 
nullary programs should be able to solve: 

ALWAYS safe: Given a nullary program 77 and a dynamic u in 77, does Cjj[A) |= 

AG (i; = 0) hold for every input AF 

SOMETIMES safe: Given a nullary program 77 and a dynamic i; in 77, does 
H EF(i; A 0) hold for every input A? 
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The next theorem states our main negative result. We call a dynamic u in a 
nullary program 77 boolean if every update of u in 77 has either the form u := 0 
or r; := 1. 

Theorem 10. For nullary programs whose input vocabulary contains two non- 
nullary symbols^ one of which is a function symbol^ ALWAYS SAFE and someti- 
mes SAFE are undecidable, ALWAYS safe is already undecidable for deterministic 
such programs with two non- boolean dynamics. 

Proof, (Sketch.) Consider a sentence p G (E+TC)(T) and let denote the 
boolean query defined by p. By Theorem 5 there exists a nullary program 
computing Obviously, p is finitely valid iff = Fin(T) iff 11^ accepts 
every A G Fin(T) iff accept) G sometimes safe. This establishes a re- 
duction of FiNVALr (F+TC) to SOMETIMES SAFE. A similar argument reduces 
finsatx(FtTC) to ALWAYS SAFE. The first assertion is now implied by: 

Lemma 11. IfT contains two non-nullary symbols^ one of which is a function 
symbof then both finsatx(F+TC) and finvalx(F+TC) are undecidable. 

The proof of Lemma 11 is by reduction of two undecidable problems for de- 
terministic finite automata with two input heads (namely the emptiness problem 
and its dual - the totality problem) to FiNSATr (F+TC) and FiNVALr(F+TC), 
respectively. A straightforward adaption of the first reduction yields the second 
assertion of the theorem. □ 

Theorem 10 essentially says that nullary programs which assume (arbitrarily 
defined) functions in their input cannot be verified algorithmically. But what 
if we stick to relational input and increase the computational power of nullary 
programs? Following the general ASM- framework we may allow first-order quan- 
tifiers in guards or dynamic functions of arity > 0. (A unary dynamic function 
/, e.g., can occur in an update of the form f{t) := s, meaning that in the next 
state the value of / at argument t will be updated to s.) The proof of the next 
corollary is similar to that of the second assertion of Theorem 10. 

Corollsiry 12. If the definition of nullary programs is relaxed in one of the 
following two ways and the input vocabulary contains a relation symbol of arity 
> 2, then ALWAYS safe is undecidable, (1) Allow a single first-order quantifier 
to occur in one guard, (2) Allow the usage of one unary dynamic function, 

6 Conclusions and Future Work 

We have introduced nullary programs - a class of restricted abstract state ma- 
chine programs - and investigated the problem of verifying them automatically. 
On the one hand, automatic verification of nullary programs with relational 
input (against CTL*-like correctness properties) is Ps PACE-complete. On the 
other hand, most basic verification problems become undecidable when we ad- 
mit arbitrarily defined functions in the input or increase the computational power 
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of nullary programs in a straightforward manner. Altogether this might suggest 
that with nullary programs we are approaching the limit of automatic verifiabi- 
lity of ASM-programs. 

There are several directions for future work. (1) The decision procedures 
underlying Theorem 1 form the core of our verification algorithm. Both proce- 
dures perform a semi-naive exhaustive search and hence are not efficient. The 
question is whether they can be improved so that we obtain a reasonable per- 
formance in realistic settings. (2) Identify other fragments L of CGL* for which 
VERIFY (L) is decidable. To this end investigate finite validity and finite satisfia- 
bility of formulas obtained by Fact 9 when ip varies in L. (3) Extend CGL* with 
counting constructs. Notice that properties like “c/p holds in all even moments” 
are expressible in (E+TC). 

Acknowledgements. I am grateful to Erich Gradel for bringing the subject 
of model checking ASMs to my attention and to Eric Rosen for many fruitful 
discussions and valuable suggestions. 
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Abstract. The construction of abstractions is essential for reducing 
large or infinite state systems to small or finite state systems. Boolean 
abstractions, where boolean variables replace concrete predicates, are 
an important class that subsume several abstraction schemes. We show 
how boolean abstractions can be constructed simply, efficiently, and pre- 
cisely for infinite state systems while preserving properties in the full 
/x-calculus. We also propose an automatic refinement algorithm which 
refines the abstraction until the property is verified or a counterexample 
is found. Our algorithm is implemented as a proof rule in the PVS veri- 
fication system. With the abstraction proof rule, proof strategies combi- 
ning deductive proof construction, model checking, and abstraction can 
be defined entirely within the PVS framework. 



1 Introduction 

When verifying temporal properties of reactive systems, algorithmic methods 
are used when the problem is decidable, and deductive methods are employed, 
otherwise. Algorithmic methods such as model checking are limited by the state 
space explosion problem. State space reduction techniques such as symbolic re- 
presentations, symmetry, and partial order reductions have yielded good results 
but the state spaces that can be handled in this manner are still quite modest. 
Deductive methods using theorem proving continue to require a considerable 
amount of manual guidance. While it is clear that any way out of this impasse 
must rely on a combination of theorem proving and model checking, specific 
methodologies are needed to make such a combination work with a reasonable 
degree of automation. It is known that abstraction is a key methodology in com- 
bining deductive and algorithmic techniques. Abstraction can be used to reduce 
problems to model-checkable form, where deductive tools are used to construct 
valid abstract descriptions or to justify that a given abstraction is valid. In this 
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paper, we propose a practical verification methodology that is, based on a sim- 
ple, efficient, and precise form of boolean abstraction generation that preserves 
properties in the //-calculus. We extend the boolean abstraction scheme defined 
in [GS97] that uses predicates over concrete variables as abstract variables, to 
abstract assertions in the rich assertional language of PVS [OSRSC98]. The PVS 
language admits the definition of a fixed point operator that is used to define the 
//-calculus in PVS [RSS95]. With this definition of the //-calculus in PVS, model 
checking implemented as a PVS proof rule can be used as a decision procedure. 

Our conservative abstraction scheme is implemented as a proof rule that 
abstracts any PVS formula over concrete state variables and produces a PVS 
formula over abstract state variables. Any assertion expressing a general or tem- 
poral property of a concrete PVS specification is abstracted into a stronger as- 
sertion expressing a property over the corresponding abstract specification. The 
resulting abstract assertion is in a decidable logic, and decision procedures such 
as model checking can be used to discharge it. 

Unlike previous work for the automatic abstraction of infinite state systems 
using decision procedures [GS97,CU98,BL098], our algorithm does not always 
over-approximate the transition relation as is done to preserve only universally 
quantified path temporal formulas in logics such as VGTL. Extensions of the pre- 
servation results [DGG94,GGL94] to the more expressive logic GTL* are defined 
using the notion of mixed abstraction which involves multiple next-state relati- 
ons. Our algorithm abstracts a //-calculus formula which is not tied to a single 
transition system. Thus, no distinction is made between universal and existen- 
tial fragments. The integration of our abstraction algorithm as a PVS proof rule 
allows us to design powerful proof strategies combining abstract interpretation, 
model checking and proof checking. We also propose an automatic abstraction 
refinement algorithm that is applied when model checking fails. This is done by 
automatically enriching the abstract state with new relevant predicates until the 
property is proved or a counterexample is found. 

The paper is organized as follows. In Section 2 we show how boolean abstrac- 
tions can be defined in PVS. In Section 3, we present an efficient abstraction 
algorithm for the computation of the “most precise” abstraction of a given boo- 
lean abstraction of a predicate over concrete state variables. In Section 4, we 
generalize this algorithm to abstract any PVS assertion, including //-calculus 
formulas over concrete state variables into assertions over abstract state varia- 
bles. In Section 5, we present the refinement algorithm. 



2 Boolean Abstractions in PVS 

Propositional //-calculus is an extension of propositional calculus that includes 
predicates defined by means of least and greatest fixed point operators, // and 
respectively. It is strictly more expressive than GTL* which includes both linear 
and branching time temporal logics such as LTL and GTL. In [RSS95] a detailed 
description of the encoding of the propositional //-calculus in PVS is presented. 
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The least fixed point operator is defined as = p|{^ I F{x) C x}, the 

predicate that is the greatest lower bound of the pre-fixed points of a monotone 
predicate transformer F. The temporal operators of CTL, such as AG, AF, 
EG, and EF, can be easily defined using their fixed-point characterizations. 
When the state space is finite, the predicates can be coded in boolean form 
and model checking of //-calculus formulas can be done using binary decision 
diagrams (BDDs). 

As a simple example, we consider a simple protocol where two processes are 
competing to enter a critical section in mutual exclusion using a semaphore. The 
PVS theory describing the protocol is given as follows. 



semaphore : THEORY 
BEGIN 

IMPORTING MUOctlops 

location : TYPE = {idle, wait, critical} 
state : TYPE = [# pci ,pc2 : location , sem: int #] 
s,sl,s2 : VAR state 

init(s) : booI= pcl(s)=idle and pc2(s)=idle and sem(s)=l 

N(sl,s2) : bool = . . . 

safe: THEOREM 
init(s) IMPLIES 

AG(N, LAMBDA s: NOT (critical? (pci (s) ) AND 
critical?(pc2(s) ) ) ) (s) 

END semaphore 



The state is given as a record consisting of two program counters and a sema- 
phore sem. The expression N(sl,s2) is transition relation of the protocol. We 
are interested in proving that both processes have mutually exclusive access to 
the critical section. The property safe is expressed as a CTL property using the 
usual operator AG, which is translated into a //-calculus property. When the 
state type is finite, the property can be verified using model checking[RSS95]. 
In this simple example, sem is of type integer and cannot be encoded with a 
finite number of boolean variables and hence the property cannot be directly 
model checked. We propose to extend the capabilities of PVS with a boolean 
abstraction mechanism that can conservatively reduce a //-calculus property of 
an infinite state system to model checkable form. In this abstraction, certain 
predicates at the concrete level (that might be used in guards, expressions, or 
properties) can be replaced by abstract boolean variables. This gives us a ge- 
neral method for constructing abstractions by evaluating any predicate over the 
variables of the program. Since the set of boolean variables is finite, so is the 
set of abstract states. Boolean abstraction is defined using a set of predicates 
of the form A(s : state) : (f[s) over the concrete state type state. An ab- 
straction of the mutual exclusion protocol can be defined using two predicates 
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A(s) : sem{s) < 0 and A(s) : sem{s) > 0. These predicates define an abstract 
state type 

abs_state : type = pcl,pc2 : location, Bl, B2 : boolean^^^] 

where the state components pci and pc2 are of finite type and therefore are 
not abstracted, and the state component sem referenced by the two predicates 
defining the abstraction is encoded with two boolean components B1 and B2 
corresponding to the two predicates. In this particular example, these two pre- 
dicates happen to be exclusive, but boolean abstractions can be defined more 
generally with an arbitrary set of predicates over the concrete state type. 



3 Efficient Computation of Boolean Abstractions 



Abstract interpretation [CC77] is the general framework for defining abstractions 
using Galois connections^. The domain of the abstraction function a consists of 
sets of concrete states, represented by predicates, and ordered by implication. 
The range of the abstraction consists of boolean formulas constructed using the 
boolean variables i^i, • • • , ordered by implication. If X ranges over sets of 
concrete states and Y ranges over boolean formulas in i^i, • • • , , then the 

abstraction and concretization function a and 7 have the following properties: 

- a{X) = /\{Y\X^j{Y)}, 

- j{Y)=\/{X \a{X)^Y}. 



However, we use a simpler and precise concretization function 7 which consists 
simply in substituting each abstract variable Bi by its corresponding predicate 
and each abstract state variable abs_s by the corresponding concrete state 
variable s. That is 

7 (T) = Y[(fi{s)/Bi{abs.s)]. 

We propose to apply boolean abstractions to any predicate (assertion or transi- 
tion relation) written in a rich assertional language. 



Abstraction of assertions. For any predicate F over the concrete variables, 
the abstraction Oi{F) of F can be computed as the conjunction of all boolean 
expressions b satisfying the condition: 

( 1 ) 

^ A Galois connection is a pair (a, 7) defining a mapping between a concrete domain 
lattice p{Q) and an abstract domain lattice where a and 7 are two monotonic 

functions such that V(Fi,T2) E p{Q) x p{Q^). o(Fi) C B2 ^ Fi <Z 7(^2). 
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Note that there are 2^^ distinct boolean truth functions in k variables, and 
testing all of these could become very expensive. This set is designated as the 
set of test points. An abstraction is precise with respect to the considered abstract 
lattice, if the set of test points is the entire set of the boolean expressions forming 
the abstract lattice. Any over- approximation of the Oi{F) can be computed with 
a smaller set of test points for which the implication (1) must be valid. For 
example, in [GS97], the abstract lattice considered is the lattice of monomials^ 
over the set of boolean variables. In this case, it is not necessary to prove (1) for 
all the monomials over the set {5i, • • • , 5/^}, but only for the atoms Bi^ • • ^ Bk 
and their negations. We can efficiently compute Oi{B) for any predicate B by 
choosing the abstract space as the whole boolean algebra over B or by choosing 
a sub-lattice of B and the corresponding test points, using the following fact: 

Theorem 1. Let B = {5i,* • • ^Bk} be a set of boolean variables^ and let Ba 
be the boolean algebra defined by the structure < B ^ ^^firue^ false >. Let 
T>b be the subset of Ba containing only literals^ and disjunctions of literals. To 
compute the most precise image by a of any set of concrete states P (given as a 
predicate), it is su cient to consider as a set of test points, the set Vb instead 
of the whole set Ba of boolean expressions. That is, testing 

P ^ 7(6) 

for all boolean expressions in Ba is equivalent to test this implication only for b 
in T>b • That is, 2^ tests can be reduced to at most only 3^ — 1 tests. 

Proof. We consider the fact that each boolean expression b can be written 
in a conjunctive normal form di A ••• A dj , where each di is a disjunction of 
literals. Thus, the proof of the implication (1) for each element b can be first 
decomposed to simpler proofs P ^ j{di). This implication can be proved for 
each di by first testing one disjunct, that is a literal, or more than one disjunct 
if necessary. That is, only for disjunctions in ■ 

This theorem gives us an efficient way of computing precise abstractions by 
reducing the set of proof obligations from 2^ , the number of elements of Baj 
to only 3^ — 1, the number of elements of the smaller set 27^, and also gives us 
an order in which the proof obligations should be generated and proved. In fact, 
when the set of predicates {pi, - • ',Tk} is properly chosen, the actual number 
of tests is far fewer than 3^ — 1. When a proof for any element bi of the set T>b 
succeeds or fails, then the number of tests will decrease due to the fact that for 
many elements bj of T>Bj the test is redundant due to subsumption. Figure 1(a) 
shows how the image by o; of a set P{s) of concrete states is computed. The 
variable a is initialized to true. The variable fail consists of the set of elements 
of T>b that have not been proved to be in the abstraction of P. The set fail is 

^ Monomial are the expressions ^ bi where each bi is either Bi or ~^Bi. 

^ A literal is either a boolean variable Bi or its negation —^Bi 
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a+{P{s),C) 

Initialization 

a := TRUE; 
foil := 0; 
i := 1; 

Iteration 

while i < k do 
7) := disjuncts(i^ a); 
while P / 0 do 
let b = chooseAn T) in 
remove b from T) 

If -i(a A 6 G fail) 

Then 

If h P(g) AC ^ 7(6) 
Then a := a Ab 
Else fail := fail U b 
Else skip 
od 

i := i + 1 

od 

return a 



a^{P{si,S 2 ),C) 

Initialization 

a := TRUE; 
fail := {EALSE}; 
i := 1 ; j := 1 ; 

Iteration 

while j < k do 
C := conjuncts[j^ a); 
while i < k do 
T) := disjuncts{i^ a); 
while V ^ 0 A C/ 0 do 
let (61,62) = chooseAn C x V in 
If -i(a A (61 =^62) G fail) 

Then 

If h S2) A (7 A 7(61) 7(62) 

Then a := a A (61 =^62) 

Else fail := fail U (61 =^62) 
Else skip 



return a 



(a) 



(b) 



Fig. 1. Efficient computation of <y{P) 



initially just the singleton {FALSE}. It is assumed that there has already been a 
prior check to ensure that P(s) A(7 is not equivalent to F ALSE. The construction 
starts by using disjunctions of length 1 , i.e., the literals Bi and for b. The 
literals b for which the proof obligation P{s) ^ 7(6) succeeds, are added to o;. At 
each iteration, when such a proof succeeds, it is possible to eliminate from the 
current set of test points the elements for which the test is no longer necessary 
This is done by the test a Ab E fail. For instance, in the first iteration when 
we consider only literals, if the proof succeeds for 5 ^, it is not necessary to 
test -^Bj. The test for —^Bi can only fail, otherwise, both —^Bi and Bi would be 
added to o;, and Oi[P) would be equivalent to FALSE. In the next iteration, the 
test points that are disjunctions of two literals and not already subsumed by the 
disjunctions in a, are considered. Once again, the successful test points are added 
to q;, i is incremented and the iteration is repeated for disjunctions of length i. 
The image o; of a set of concrete states is computed incrementally and can 
be interrupted at any moment, providing an over- approximation of the precise 
image. Furthermore, we use additional heuristics to avoid unnecessary tests. For 
instance, if the intersection of the set of free variables of P and those of 7(i^^) is 
empty, it is not necessary to consider the boolean expressions constructed using 
B,. 
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Abstraction of a Transition Relation, Transitions are expressed as general 
assertions over a pair of concrete states (51,82). The abstraction of a predi- 
cate P{si^S2) describing such a transition relation is defined as a predicate 
B[abs-Si^ abs-S2) over the abstract pair {abs-Si^abss2). Figure 1(b) shows how 
a concrete predicate P{si^S2) representing a transition relation is abstracted. 
The algorithm constructs a transition relation over the variables • • • , P 2 k} 
by constraining the current and the next abstract states. This is done by conside- 
ring as set of test points the set of implications bi ^ 62 , where bi and 62 represent 
formulas in the current and the next abstract state variables, respectively. Again, 
the abstraction of P is computed incrementally by first constraining the next 
state, that is by enumerating the disjunctions 62. When all the proofs fail for 
a given choice of 61 , the current state is constrained by considering a longer 
conjunction for 61 . Consider for instance the expression 

52 = 5iWlTH [sem := sem(5i) + 1]. 

This assertion over a pair of concrete state variables (52, 52) of type state is 
abstracted with respect to the predicates A(s) : serafs) < 0 and A(s) : serafs) > 
0 to the following assertion over a pair of abstract state variables (a65_5i, abss2) 
of type abs -State: 

(i^i(a65_5i) ^ (i^i(a65_52) V P2{abs.S2))) 

A {B2{abs.si) ^ B2{abs.S2)) 

A {^B2{abs-Si) ^ {^B2{abs-S2) V -ii^i(a65_52)) A (i^i(a65_52) V B2{abss2))) 

A (-ii^i(a65_5i) ^ B2{abs-S2) A -ii^i(a65_52)). 

4 Abstract Interpretation as a Proof Rule 

Our abstraction algorithm computes the most precise over- approximation of an 
assertion over concrete states, using a validity checker for the generated as- 
sertions. We implemented this algorithm in the TVS verification system as a 
primitive proof rule. Our goal is to approximate a TVS formula over concrete 
state variables, that is a TVS boolean expression, by a formula over abstract 
state variables. This generated theorem is stronger than the original one. Howe- 
ver, it is expressed in a decidable theory that can be handled by model-checking, 
HDD simplification, or the ground decision procedures available in TVS. To do 
so, we generalize the abstraction algorithm defined in [PH97] for the //-calculus 
to the PVS assertion language and we use our abstraction algorithm to approxi- 
mate assertions. This algorithm abstracts propositional //-calculus formulas using 
over- approximation of predicates and under-approximation of negated predica- 
tes. Under-approximation of an assertion is defined as follows: 

a_(P(s)) = \J{b\j{b)^P{s)} 



We use only the over- approximation algorithm relying on the following lemma. 
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Lemma 1 . Let ip a predicate defining a set of states. For all predicate p 

a+(-'V^(s)) O -.a_(v?(s)). 

We now formally define the abstraction function [[ ]]^ which approximates a PVS 
boolean expression / such that, [[ / ]^ denotes an over approximation of /, and 
I / ] an under approximation of /. We also use a context c consisting of a 
PVS formula that is valid at the PVS sub formula that is being approximated. 
The intuition behind using such a context expression is that when an expression 
Cl A 62 is being abstracted, one can assume that ei is valid when abstracting 62 
and vice-versa. The context when omitted is just the boolean constant TRUE. 
I / ir denotes the approximation of / under the context c. 

Approximation of PVS assertions. The abstraction function | ] is defined re- 
cursively on the structure of the PVS assertion language as follows. 



propositions : 


■ [ei A 62}'^ 


^ leilcAea A Ie2yAei 








quantifiers : 


P(s) : ei: 


— ^ 3 {abs_s) : |ey 




IV(s) : ey 


— ^ V(a6s_s) : |ey 




IA(s) : ey 


— ^ \{abs.s) : |ey 


fixpoints : 


lfi/v{X{Q) : 


F(Q))Ec ^ AKA(a&s_Q) : iF(Q)y) 


atoms : 




— ^ a+(e(s),c) 




[e(si,S2)l+ 


— ^ a+(e(si,S2),c) 






— ^ a_ (e(s), c) 




[e(si,S2)l^ 


— ^ o;_(e(si,S2),c) 






— ^ Bfiahs-s) 


constants : 


[ei: 


— ^ e free variables [e) 



The following theorem establishes the fact that the abstraction provides, respec- 
tively, an over and under approximation of any PVS boolean expression. 

Theorem 2 . Let f he a PVS assertion^ [ 1 abstraction function. We have: 

^/^7([/A) and h 7(| / V) ^ / 

Proof. The proof is established by induction on the structure of the assertion 
/. It is easy to show that by the definitions of and o;_, both implications 
hold when / is an atom. The other cases can be deduced by monotonicity of the 
logical connectives, and the fixed point operators. ■ 

The soundness of the abstraction function is established by the following 
theorem. 
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Theorem 3 (preservation). Let | ]] the abstraction function defined above 
as a boolean abstraction^ and let f be any PVS boolean formula. Then 

h I / ]]~ implies h / 

Theorem 2 ensures that for an assertion /, the abstraction algorithm produces 
a stronger assertion 7 (| / ]~). Note that b | / ]~ trivially implies h 7 (| / 
which then justifies the preservation result of Theorem 3. 

The abstraction algorithm where a formula / is under- approximated is im- 
plemented as a PVS proof rule abstract. This atomic proof rule takes a goal 
given by a PVS formula (a //-calculus formula) and a set of state predicates, and 
translates this to a propositional formula (a propositional //-calculus formula) 
which is returned as a new goal. This goal can be discharged using any other 
PVS proof command including BDD simplification and model checking. 

We have defined a PVS proof strategy that carries out a sequence of inference 
steps that simplify goal formulas by rewriting all definitions, including constant 
definitions such as the temporal operators of the logic CTL in terms of the // 
and 1 / operators, and applies the abstraction function on the resulting goal. 



V (s : state) : V 

init(s) D 

-I//. A [Q : predfstate]) : 

(A (// : state) : 

(~iA s : 

-i(critical? (pcl(s))A 
critical?(pc2(s)))) 

(w)V 

3 (v : state) : 

{Q{v) AN{u,v))){s) 



[abs-S : abs_state) : 

-.pnit(s)| + V 

-I//. A [abs-Q : pred[abs_state]) : 

(A [abs_u : abs_state) : 

(-lA abs-s : 

-I (critical? (pci (a6s_s) ) A 
critical?(pc2(a6s_s)))) 

(a6s_//)V 

3 (abs-V : abs_state) : 

(abs-Q{abs-v) A [[V (//, v)|^))(a6s_s) 



Fig. 2. An example of abstraction for a PVS assertion 



Figure 2 shows how the //-calculus formula corresponding to the theorem 
safe presented in the PVS theory semaphore in Section 2 is approximated. The 
property of mutual exclusion A(s) : -i (critical? (pc l(s)) A critical?(pc2(s))) is 
expressed as an invariance property. As expected for such properties, the initial 
state and the transition relation are over- approximated. For instance, we have 

|init(s)]^ — ^ idle?(pcl(a6s_s)) A idle?(pc2(a6s_s)) A -^Bfiabss) A B2{abss) 

We have tried other examples including a simple snoopy cache-coherence 
protocol with an arbitrary number of processes [Rus97] and a variant of the 
alternating-bit communication protocol called the bounded retransmission pro- 
tocol [HS96]. The main invariant of the cache coherence protocol is proved by an 
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abstraction defined in terms of five predicates. The preservation of the invariant 
is then proved by abstraction and BDD-based propositional simplification. 

The bounded retransmission protocol is verified using an abstraction also 
defined in terms of five predicates. The construction of the abstract description 
takes about 100 seconds in TVS. The resulting abstract assertion is discharged 
using model checking. In contrast, Havelund and Shankar’s verification [HS96] of 
this example required 57 invariants to justify the validity of a manually derived 
abstraction. 



5 Refining an Abstraction 

The abstraction proof rule is used in TVS to generate new goals that depend 
only on finite state variables. Such goals can be discharged using a TVS proof 
rule such as the BDD simplifier or the //-calculus simplifier. However model 
checking on the new goal can fail because the abstraction is too coarse. It is 
then necessary to refine the abstraction using a richer abstract domain. Since 
our abstraction algorithm presented in Section 4 allows us to compute the most 
precise abstraction with respect the predicates * * * ? refining the abstrac- 
tion requires additional predicates. The refinement algorithm takes as arguments 
the original PVS assertion /, a new list of predicates a context 

Tct computed previously. The context is a hash-table which associates to each 
atom the BDD representing its abstraction, that is the BDD o;, and the set fail 
of BDDs. The refinement algorithm descends through the structure of / and 
refines each sub-formula with the new predicates. The refinement algorithm is 
similar to the algorithm computing a^[F) of Figure 1. However the variables a 
and fail are initialized with their already computed values. This allows us to 
take advantage of the success or failure of already executed proofs. The new set 
of test points is defined as the disjunctions formed using the literals • • • ^Bi 

and their negation. This set is augmented with the boolean expressions over the 
old variables Bi^ - • • ^Bk for which the proof previously failed. The algorithm 
returns a more precise approximation oi F, 

We implemented our abstraction and refinement algorithms as a proof stra- 
tegy defining a semi-decision procedure that abstracts an original PVS formula 
and then applies model checking. If model checking fails, the abstraction is refi- 
ned until model checking succeeds. This strategy is expressed as follows in the 
PVS strategies language 

(TRY (THEN (abstract {phii . . .phik)) (model-check)) 

(skip) 

(REPEAT 

(LET (new-list-of -predicates) ) ) 

(THEN (refine F) (model-check))))) 

Our refinement algorithm tries to eliminate as much of the nondeterminism 
created by the over- approximation of the transition relation as possible. Absence 
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of nondeterminism can be easily detected by checking that when the abstraction 
of a transition S 2 ), C) is computed, the index i will never reach a value 

greater than 1. For instance, the abstraction of the assertion 

e(si,S 2 ) = S 2 = siWlTH [sem := sem(si) + 1] 
presented in Section 3 is nondeterministic since it contains the conjunct 
{Bi{abs-Si) ^ {Bi{abs-S2) V B2{abss2))^ 

Refining such an abstraction involves translating the predicate characterizing 
the next state, that is (Bi{abs_S 2 ) V B 2 {abs_S 2 )) into a disjunctive normal form. 
Then, for each disjunct, the pre-image is computed with respect the concrete 
assertion e(si,S 2 ). In this particular case, the pre-images for Bi{abss 2 ) and 
B 2 {abs.S 2 ) are, respectively, 3(s2) : e(s, S 2 ) A(fi{s 2 ) and 3(s2) : e(s, S 2 ) A(f 2 {s 2 )^ 
Their simplified forms are respectively sem{s) < 0 and sem{s) = 0. 

6 Conclusion 

We have presented a general abstraction/refinement algorithm that preserves 
the full //-calculus as the basis for an integration of abstract interpretation, mo- 
del checking, and proof checking. We have implemented this boolean abstraction 
algorithm as an extension to the PVS theorem prover. This allows us to define 
powerful proof strategies combining deductive proof, induction, abstraction, and 
model checking within a single framework. It also allows our abstraction algo- 
rithm to be used in the framework of a richly expressive specification language 
encompassing finite, infinite-state, and parametric systems. The computation of 
the abstraction is completely automatic, and uses the PVS decision procedures 
to test the generated implications. 

We are currently investigating cases where it is possible to detect whether a 
constructed abstraction 5 preserves fragments of the //-calculus so that ab- 
stract counterexamples yield concrete ones. This is done by finding sufficient con- 
ditions allowing us to use the various preservation results presented in [LGS+95, 
DGG94]. 

The new PVS version includes code generation capabilities, and as future 
work, we plan to define abstraction construction in the PVS specification langu- 
age, and to automatically extract the code implementing the abstraction opera- 
tion. Such experiments are similar to the ones presented in [vHPPR98] where, 
for instance, the code implementing a BDD simplifier is extracted automatically 
from its formal specification. 
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Deciding Equality Formulas by Small Domains 
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Abstract. We introduce an efficient decision procedure for the theory 
of equality based on finite instantiations. When using the finite instan- 
tiations method, it is a common practice to take a range of [l..n] (where 
n is the number of input non-Boolean variables) as the range for all 
non-Boolean variables, resulting in a state-space of Although various 
attempts to minimize this range were made, typically they either requi- 
red various restrictions on the investigated formulas or were not very 
effective. In many cases, the state-space cannot be handled by BDD- 
based tools within a reasonable amount of time. In this paper we show 
that significantly smaller domains can be algorithmically found, by ana- 
lyzing the structure of the formula. We also show an upper bound for 
the state-space based on this analysis. This method enabled us to verify 
formulas containing hundreds of integer and floating point variables. 

Keywords: Finite Instantiation, equality logic, uninterpreted functions, com- 

piler verification, translation validation, Range Allocation. 

1 Introduction 

Automated validation techniques for formulas of the theory of equality become 
increasingly important as the advantages of abstraction and the use of unin- 
terpreted functions (UIFs) become more evident. UIFs are mainly useful when 
proving equivalence between two models. Proving design equivalence or com- 
paring a specification to an implementation are two typical examples of such 
equivalence proofs. In our case, we proved equivalence between source and tar- 
get code serving as the input and output of a compiler, and thus verified that 
the compilation process was correct (see [PSS98b], [PSS99] and [Con95] for more 
details about this project). 

When verifying equivalence between two formulas, it is often possible to ab- 
stract away all functions, except the equality sign and Boolean operators, by 
replacing them with UIFs. An abstracted formula holds less information and 
therefore can be represented by a significantly smaller BDD. It was Ackerman 
[Ack54] who first showed the reduction of such abstracted formulas to function- 
free formulas of the theory of equality, while preserving validity. He suggested 
doing so by replacing each occurrence of a function with a new variable, and 
adding constraints that preserve their functionality as an antecedent of the 
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active Systems, a gift from Intel, a grant from the U.S. -Israel bi-national science 
foundation, and an Infrastructure grant from the Israeli Ministry of Science and the 
Arts. 
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formula, rewriting the formula (z = F(x^y) A u = y) ^ z = F{x^u) into 
{{x = X Ay = u) ^ fi = f 2 ) ^ = fi Au = y) ^ z = / 2 ). 

The abstraction process itself does not preserve validity and may transform 
a valid formula such SiSX-\-y — yFx into the invalid formula F(x, y) = F{y^ x) 
which does not hold for all functions F. However, in many useful contexts, such 
as the verification of compilers which do not perform extensive arithmetical 
optimizations, the process of abstraction is often justified. At least we can rely on 
the fact that the process of abstraction into UIFs never generates false positives, 
and that if the abstract version is found valid, this is also the case with the 
concrete formulas it abstracts. 

After performing such an abstraction followed by Ackerman’s reduction, the 
resulting formula is an equality formula, and enjoys the small model property 
(i.e. it is satisfiable iff it is satisfiable over a finite domain). Therefore, the next 
step is the calculation of a finite domain, such that the formula is valid iff it is 
valid over all interpretations of this finite domain. The latter can be checked with 
a finite state decision procedure. A known Tolk theorem’ is that it is enough to 
give each variable the range [l..n] (where n is the number of non-Boolean input 
variables), resulting in a state-space of n^. It is not difficult to see that this range 
is sufficient for preserving the validity or invalidity of the formula. If a formula 
is not valid, there is at least one assignment that makes the formula false. Any 
assignment that partitions the variables into the same equivalence classes will 
also falsify the formula (the absolute values are of no importance) . Since there can 
not be more than n classes, the [l..n] range is sufficient regardless of the formula’s 
structure. In this paper we will show that analyzing the formula’s structure can 
lead to significantly smaller domains. For example, a trivial improvement is to 
construct a graph whose vertices are the formula’s non-Boolean variables, and 
the edges represent the comparisons between them. Then, instead of giving a 
range of [l..n] to all variables, give to each variable the range [1..A:], where k 
is the size of the component it belongs to [k < n). Experiments with Teal-life’ 
problems has shown us that this simple partitioning can be very effective. 

Hojati et. al ([HIKB96], [HKGB97]) tried to avoid the [l..n] range by first 
considering the explicit DNF of the formula. Given the formula in this form, they 
‘colored’ the comparison graph of each clause (a graph based on the disequali- 
ties in the formula) and chose the maximal chromatic number (the number of 
colors needed for coloring the graph) as the range for each variable. As a second 
step, they tried to approximate the maximum number of disequalities needed to 
satisfy the formula, in a general formula. Given that number, a uniform range 
of [1..A:] is sufficient, where k is calculated on the basis of this number. It seems 
that finding a good approximation is very hard. Although several heuristics are 
suggested, it is unclear how well they work. They also indicated an inherent pro- 
blem with finding a good BDD ordering in the presence of Ackerman constraints 
(the stripping assertions in the notation of their paper). 

Sajid et al [SGZ+98] proposed a different approach. Since non-Boolean va- 
riables appear in the formula only when compared to one another, they suggest 
encoding each such comparison with a new Boolean variable, and ensuring tran- 
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sitivity of equality by restricting the BDD traversing accordingly. Although this 
traversing procedure is proved by the authors to be worst-case exponential, it 
proved to be more efficient than finite instantiations with the [l..n] range. 

Even with this range, which we show in this paper is not tight, it is not 
always the case that this kind of encoding results in a smaller state-space (as 
was mentioned by the authors themselves). Consider, for example, a formula 
where all variables are compared to each other (graphically, this is a clique of 
n vertices). In this case, n • (n — l)/2 new Boolean variables will be introduced, 
each represented by a BDD variable. Finite instantiations with a range of [l..n], 
on the other hand, will require only n • logn BDD variables. 

In a more recent work, Bryant, German and Velev [BGV99] restricted the 
logic to formulas that contain positive equalities only, i.e. the outcome of any 
equality test between terms can only be part of a monotonically positive Boo- 
lean formula. This restriction disallows the use of the outcome of equalities in 
control decisions. Given this restricted logic, they were able to substitute UIFs 
with unique constants that serve as ‘witnesses’ in case the formula is false. This 
replacement naturally reduced the state-space immensely, and made the decision 
procedure highly efficient. Although they chose the same case study examined 
by [SGZ+98], the results are not given in a way that they can be compared. 

The formulas we consider here are not restricted to positive equalities. They 
are implications of the form AlLi ^ typically with several thou- 

sand clauses on each side, and more than a thousand variables. The abstraction 
process adds several hundred more variables (hundreds of which are integer and 
floating-point) and thousands of constraints. Although we decompose the for- 
mula, we still have many verification conditions with more than 150 integer 
variables. Since the size of the domain is crucial to the time required to com- 
plete the proof with a BDD-based tool, the state-space (where n > 150 in 
our case) is naturally far too large to handle. 

In the next section, we present a precise definition of the problem we consider: 
deciding validity (satisfiability) of equality formulas, and explain how it naturally 
arises in the context of translation validation. In Section 3 we outline our general 
solution strategy, which is a computation of a small set of domains (ranges) R 
such that the formula is satisflable iff it is sat isfl able over it, followed by a test for 
it-s at isfl ability performed by a standard BDD package. The remaining question 
is how to And such a set of small domains. To answer this question, we show how 
it can be reduced to a graph-theoretic problem. The rest of the paper focuses on 
algorithms, which, in most cases, produce tractably small domains. In Section 4, 
we describe the basic algorithm. The soundness proof of the algorithm is given in 
Section 5. In Section 6, we present several improvements to the basic algorithm, 
and analyze their effect on the upper bound of the resulting state-space. We 
describe experimental results from an industrial case study in Section 7, and 
conclude in Section 8 by considering possible directions for future research. 

2 The Problem: Deciding Equality Formulas 

Our interest in the problem of deciding equality formulas arose within the context 
of the Code Validation Tool (cvt) that we developed as part of the European 
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project Sacres. The focus in this project is on developing a methodology and a 
set of tools for the correct construction of safety critical systems. 

CVT is intended to ensure the correctness of the code generator incorporated 
in the S acres tools suite which automatically translates high level specifications 
into running code in C and Ada (see [PSS98b], [PSS99], [Con95]). Rather than 
formally verifying the code generator program, CVT verifies the correctness of 
every individual run of the generator, comparing the source with the produ- 
ced target language program and checking their compatibility. This approach 
of translation validation seems to be promising in many other contexts, where 
verification of the operation of a translator or a compiler is called for. 

We will illustrate this approach by a representative example. Assume that 
a source program contained the statement z := (^i+Z/i)* (^ 2 +^/ 2 ) which 
the translator we wish to verify compiled into the following sequence of three 
assignments: 



m := Xi-\-yi] U2 := ^2 + Z/2; * ^ 2 , 

introducing the two auxiliary variables ui and U 2 . 

For this translation, CVT first constructs the verification condition 

Ml = Xi + J/i A«2 = 3:2 + J/2 A Z = Ml • «2 ^ Z = +yi) ■ + V2), 

whose validity we wish to check. 

The second step performed by CVT in handling such a formula is to abstract 
the concrete functions appearing in the formula, such as addition and multipli- 
cation, by abstract (uninterpreted) function symbols. The abstracted version of 
the above implication is: 



Ml = F{xi,yi)Au 2 = F{x 2 ,y 2 )^z = G{ui,U 2 ) z= G{F{xi,yi), F{x2,y2)) 

Clearly, if the abstracted version is valid then so is the original concrete one. 

Next, we perform the Ackerman reduction [Ack54], replacing each functional 
term by a fresh variable but adding, for each pair of terms with the same function 
symbol, an extra antecedent which guarantees the functionality of these terms. 
Namely, that if the two arguments of the original terms were equal, then the 
terms should be equal. It is not difficult to see that this transformation preserves 
validity. 

Applying the Ackerman reduction to the abstracted formula, we obtain the 
following equality formula: 



if: 



' [xi = X2 Aj/1 = J/2 
(mi = /l a M2 = /2 



A = /2) A^ 
gi = 92) A 



z = 92 (1) 



I Ml = /i A M2 = A A 2 ; = fifi ) 

Note the extra antecedent ensuring the functionality of F by identifying the 
conditions under which fi should equal /2 and the similar requirement for G. 

This shows how equality formulas such as f of Equation (1) arise in the 
process of translation validation. 

Equality Formulsis: Even though the variables appearing in an equality for- 
mula such as f are assumed to be completely uninterpreted, it is not difficult to 
see that a formula such as f is generally valid (satisfiable) iff it is valid (respec- 
tively, satisfiable) when the variables appearing in the formula range over the 
integers. This leads to the following definition of the syntax of equality formulas 
that the method presented in this paper can handle. 
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Let . . . be a set of integer variables^ and 62, . . . be a set of Boolean 

variables. We define the set of terms T by 

T ::= integer constant \ Xi \ if ^ then Ti else T2 
The set of equality formulas # is defined by 

# ::= bj I - 1 # I V ^2 I ^1 = I if ^0 then else #2 

Additional Boolean operators such as A, can be defined in terms of V* 

For simplicity, we will not consider in this paper the cases of integer constants 
and Boolean variables. The full algorithm is presented in [PRSS98]. 

3 The Solution: Instantiations over Small Domains 

Our solution strategy for checking whether a given equality formula (p is satis- 
fiable can be summarized as follows: 

1. Determine, in polynomial time, a range allocation R : Vars[p) 1-^ 2^, by 
mapping each integer variable ^ p into a small finite set of integers, such 
that p is satisfiable (valid) iff it is sat isfi able (respectively, valid) over some 
R- int erpret at ion . 

2. Encode each variable Xi as an enumerated type over its finite domain R(x^), 
and use a standard bdd package to construct a bdd Formula p is satis- 
fiable iff B(p is not identical to 0. 

We define the complexity of a range allocation R to be the size of the state-space 
spanned by it, that is, if Vars[p) = {xi, . . . ,x^}, then the complexity of R is 
|it| = \R[xi)\ X |it(x2)| X • • • X |it(x^)|. Obviously, the success of our method 
depends on our ability to find range allocations with small complexity. 

3.1 Some Simple Bounds 

In theory, there always exists a singleton range allocation it*, satisfying the 
above requirements, such that it* allocates each variable a domain consisting of a 
single natural, i.e., |it*| = 1. This is supported by the following trivial argument. 
If p is satisfiable, then there exists an assignment (xi, . . . ,x^) = (xi, . . . ,x^) 
satisfying p. It is sufficient to take it* : 1-^ ^ ^ ^ ^ {^n} fbe singleton 

allocation. If p is unsatisfiable, it is sufficient to take it* : xi, . . . , 1-^ {0}. 

However, finding the singleton allocation it* amounts to a head-on attack 
on the primary NP-complete problem. Instead, we generalize the problem and 
attempt to find a small range allocation which is adequate for a set of formulas 
# which are “structurally similar” to the formula p^ and includes p itself. 

Consequently, we say that the range allocation R is adequate for the formula 
set <P if, for every equality formula in the set p ^ <P^ p \s> satisfiable iff p is 
satisfiable over R. 

First, let us consider the set of all equality formulas with at most n 
variables. 

Claim 1 (Folk theorem) The uniform range allocation R : {xi, . . . 1-^ 

[l..n] with complexity is adequate for 

We can do better if we do not insist on a uniform range allocation which allocates 
the same domain to all variables. Thus the range allocation R : Xi is 

also adequate for and has the better complexity of n!. In fact, we conjecture 
that n! is also a lower bound on the size of range allocations adequate for <Pj^. 
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The formula set utilizes only a simple structural characteristic common 
to all of its members, namely, the number of variables. Focusing on additional 
structural characteristics of formulas, we obtain much smaller adequate range 
allocations, which we proceed to describe in the rest of this paper. 

3.2 An Approach Based on the Set of Atomic Formulas 

We assume that ip has no constants or Boolean variables, and is given in a 
positive form, i.e. negations are only allowed within atomic formulas of the form 
Xi 7^ Xj. An important property of formulas in positive form is that they are 
monotonically satisfied, i.e. if and S2 are two subsets of atomic formulas of ip 
(where ip is given in positive form), and C A2, then |= cp implies S2 |= A- 
Any equality formula can be brought into a positive form, by expressing all 
Boolean operations such as ^ and the if-then- else construct in terms of the 
basic Boolean operations V, and A, and pushing all negations inside. 

Let At[(p) be the set of all atomic formulas of the form Xi = Xj or Xi ^ Xj 
appearing in (/?, and let ^(A) be the family of all equality formulas which have 
A as the set of their atomic formulas. Obviously (p G ^{At[p)). Note that the 
family defined by the atomic formula set {xi = X2,^i 7^ X2} includes both the 
satisfiable formula X\=X2 V xi^X2 and the unsatisfiable formula X\=X2 t\x\^X2^ 

For a set of atomic formulas A, we say that the subset B = • • • , O A 

is consistent if the conjunction A* • is satisfiable. Note that a set B is con- 
sistent iff it does not contain a chain of the form = ^2 , X2 = ^3, . . . , 
together with the formula xi 7^ x^. 

Given a set of atomic formulas A, a range allocation R is defined to be 
satisfactory for A if every consistent subset 5 C A is A!-satisfiable. 

For example, the range allocation A!:xi,X2,X3 1-^ {0} is satisfactory for the 
atomic formula set {xi = X2, X2 = X3}, while the allocation R:xi {!}, X2 1-^ 
{2}, X3 1-^ {3} is satisfactory for the formula set {xi 7^ X2, X2 7^ X3}. On the 
other hand, no singleton allocation is satisfactory for the set {xi = X2, xi 7^ X2}- 
A minimal satisfactory allocation for this set is R:xi {!}, X2 1-^ {F 21- 

Claim 2 The range allocation R is satisfactory for the atomic formula set A i 
R is adequate for d>[A) the set of formulas p such that At[p) = A. 

Thus, we concentrate our efforts on finding a small range allocation which is 
satisfactory for A = At[p) for a given equality formula p. In view of the claim, 
we will continue to use the terms satisfactory and adequate synonymously. 

Partition the set A into the two sets A = A= U Ay, A= containing all the 
equality formulas in A, while Ay contains the disequalities. Variable x^ is called 
a mixed variable iff (x^,Xj) G A= and (x^,x/^) G Ay for some Xj^Xk G Vars[p). 

Note that the sets A={p) and Ay(y) for a given formula p can be computed 
without actually carrying out the transformation to positive form. All that is 
required is to check whether a given atomic formula has a positive or negative 
polarity within p. A sub- formula p has a positive polarity within p iff it is nested 
under an even number of negations. 

Example L Let us illustrate these concepts on the formula p of Equation (1), 
whose validity we wished to check. 
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Since our main algorithm checks for satisfiability, we proceed to form the 
positive form of -k/?, which is given by: 






' {^1 ^ X2 y Vl ^ V2 V /i = /2) A ' 
(mi ¥= f I y U 2 ^ f'2 y gi = 92) y 

, Ul = fi A U2 = f2 A Z = gi , 



A Z^ 92 , 



and therefore 

a= : {(/i = /2), {gi = 92), (mi = h), {U2 = f2), (z = 9i)} 

: {(xi ^ X2), {vi ^ 92), {ui ^ fi), {u2 ^ f2), {z ^ 92)} 

Note that ^2, /i, / 2 , 5^2 ^ in this example are mixed variables. 

□ 



This example would require a state-space of 11 ! if we used the range allocation 
[l..i] ( 11 ^\ using [l..n]). As is shown below, our algorithm finds an adequate 
range allocation of size 16 . 

3.3 A Graph-Theoretic Representation of the Sets A=^ 

The sets Ay and A= can be represented by two graphs, and defined as 
follows: 

{xi^Xj) is an edge on the equalities graphs iff [xi = Xj) G A=. 

{xi^Xj) is an edge on the dis equalities graphs iff [xi ^ Xj) G Ay. 

We refer to the joint graph as G. Each vertex in G represents a variable. Vertices 
representing mixed variables are called mixed vertices. 

An inconsistent subset B C A will appear as a contradictory cycle i.e. a cycle 
consisting of a single G^ edge and any positive number of G^ edges. 

In Fig. 1 , we present the graph G corresponding to the formula -xy, where 
-edges are represented by dashed lines and -edges are represented by solid 
lines. Note the three contradictory cycles: {g2 ~ 9i ~ ^) 7 (^1 ~ f 1)7 (w2 — f 2)^ 




Fig. 1. The Graph G : G^ U G^ representing -ly 



4 The Basic Range Allocation Algorithm 

Following is a two-step algorithm for computing an economic range allocation R 
for the variables in a given formula y. 

I. Pre-processing 

Initially, R{xi) = 0 , for all vertices Xi G G. 

A. Remove all G^ edges which do not lie on a contradictory cycle. 

B. For every singleton vertex (a vertex comprising a connected component by 
itself) Xi^ add to R{xi) a fresh value and remove Xi from the graph. 
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II. Value Allocation 

A. While there are mixed vertices in G do: 

1. Choose a mixed vertex Xi. Add a fresh value, to R{xi), 

2. Assign R{xj) := R{xj) U for each vertex Xj^ s.t. there is a t?_-path 
from Xi to Xj, 

3. Remove Xi from the graph. 

B. For each (remaining) connected component C=, add a common fresh 

value uc^ to for every xj. G G=. 

We refer to the fresh values Ui added to R{xi) in steps LB and II.A.l, and uc^ 
added to R{xj.) for xj. E G= in step II. B, as the characteristic values of these 
vertices. We write char[xi) = Ui and char{xk) = - Note that every vertex is 

assigned a single characteristic value. Vertices which are assigned their characte- 
ristic values in steps I.B and II.A.l are called individually assigned vertices^ while 
the vertices assigned characteristic values in step II. B are called communally 
assigned vertices. Fresh values are assigned in ascending order, so that char[xi) < 
char{xj) implies that Xi was assigned its characteristic value before Xj. 

The presented description of the algorithm leaves open the order in which 
vertices are chosen in step II. A, which has a strong impact on the size of the 
resulting state-space. The set of vertices that are removed in this step can be 
seen as a vertex cover of the G^ edges, i.e., a set of vertices V such that every 
G^ edge has at least one of its ends in V . To keep this set as small as possible, 
we apply the known “greedy” heuristic for the Minimal Vertex Cover problem, 
and accordingly we denote this set by mvc. We choose mixed vertices following a 
descending degree on G^. Among vertices with equal degrees on we choose 
the one with the highest degree on G^. This heuristic seems not only to find a 
small vertex cover, it also partitions the graph rather rapidly. 

Example 2. The following table represents the sequence of steps resulting from 
the application of the Basic Range Allocation algorithm to the formula —ep: 



Step/ var 


Xi 


X2 


yi 


V2 


Ui 


fi 


/2 


U2 


92 


^ 9i 


Removed 


Step I. A 


















Edges: (xi -X 2 ),{yi -2/2) 


Step I.B 


0 


1 


2 


3 
















xi,x2,yi,y2 


step II.A (/i) 










4 


4 


4 


4 








fi 


Step II.A {h) 














4,5 


4,5 








f2 


Step II.A (§ 2 ) 


















6 


6 


6 


92 


Step II. B 










4,7 
















Step II. B 
















4,5,8 










Step II. B 




















6,9 


6,9 




Final R-sets 


0 


1 


2 


3 


4,7 


4 


4,5 


4,5,8 


6 


6,9 


6,9 


Size = 48 



5 The Algorithm is Sound 

In this section we argue for the soundness of the basic algorithm. We begin by 
describing a procedure which, given the allocation R produced by the basic algo- 
rithm and a consistent subset i^, assigns to each variable G G an integer value 
a[xi) G R{xi). We then continue by proving that this assignment guarantees that 
every consistent subset is satisfied, and that it is always feasible. 
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An Assignment Procedure 

Given a consistent subset B and its representative graph G[B)^ assign to each 
vertex Xi G G{B) a value a{xi) G R{xi), according to the following rules: 

1 . If is connected by a (possibly empty) G_(i^)-path to an individually 
assigned vertex assign to Xi the minimal value of char[xj) among such x^’s. 

2. Otherwise, assign to Xi its communally assigned value char[xi). 

Example 3, Consider the ii!-sets that were computed in example 2. Let us apply 
the assignment procedure to a subset B that contains all edges excluding both 
edges between u\ to /i, the dashed edge between gi and ^ 2 , and the solid edge 
between /2 and U 2 . The assignment will be as follows: 

By rule 1, /i, /2 and U 2 are assigned the value char\fi) =‘4’, because /i was 
the first mixed vertex in the sub-graph {/i, / 2 , U 2 } that was removed in step 
II.A, and consequently it has the minimal characteristic value. 

By rule 1, xi,X 2 ,yi and are assigned the characteristic values ‘O’, ‘1’, ‘2’, 
‘3’ respectively, which they received in step LB. 

By rule 1, ^2 is assigned the value char[g 2 ) =T’ which it received in II.A. 

By rule 2, z and g± are assigned the value ‘9’ which they received in I LB. 

□ 

Claim 3 The assignment procedure satisfies every consistent subset B, 

Proof: We have to show that all constraints implied by the set B are satisfied 
by the assignment. 

Consider first the case of two variables Xi and Xj which are connected by a 
G^[B) edge. We have to show that a(x^) = a[xj). Since Xi and Xj are G^[B)~ 
connected, they belong to the same -connected component. If they were 

both assigned a value in step 1 , then they were assigned the minimal value of an 
individually assigned vertex to which they are both (5)-connected. If, on the 
other hand, they were both assigned a value in step 2 , then they were assigned 
the communal value assigned to the G^ component to which they both belong. 
Thus, in both cases they are assigned the same value. 

Next, consider the case of two variables Xi and Xj which are connected by a 
G^[B) edge. To show that a(x^) 7 ^ a(xj), we distinguish between three cases: 

A: If both Xi and Xj were assigned values by rule 1, they must have in- 
herited their values from two distinct individually allocated vertices. Because, 
otherwise, they are both connected by a G^[B) path to a common vertex, which 
together with the (x^,Xj) G^[B)-edge closes a contradictory cycle, excluded by 
the assumption that B is consistent. 

B: If one of x^, Xj was assigned a value by rule 1 while the other acquired its 
value by rule 2 , then since any communal value is distinct from any individually 
allocated value, a(x^) must differ from a{xj), 

C: The remaining case is when both Xi and Xj were assigned values by 
rule 2. The fact that they were not assigned values in step 1 implies that their 
characteristic values are not individually but communally allocated. If a(x^) = 
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a[xj) it means that Xi and Xj were allocated their communal values in the 
same step II.B of the allocation algorithm, which implies that they had a G^- 
path between them. Hence, Xi and Xj belong to a contradictory cycle, and the 
solid edge was therefore still part of G in the beginning of step II.A. 

By definition of muc, at least one of them was individually assigned in step 
II.A.l, and consequently, according to the assignment procedure, the component 
it belongs to is assigned a value by rule 1, in contrast to our assumption. We 
can therefore conclude that our assumption that a[xi) = a[xj) was false. □ 
Claim 4 The assignment procedure is feasible (i.e. the R-sets include the values 
required by the assignment procedure). 

Proof: Consider first the two classes of vertices that are assigned a value by rule 
1. The first class includes vertices that are removed in step LB. These vertices 
have only one (empty) G^[B) path to themselves, and are therefore assigned the 
characteristic value they received in this step. The second class includes vertices 
that have a (possibly empty) G^[B) path to a vertex from mvc. Let Xi denote 
such a vertex, and let Xj be the vertex with the minimal characteristic value that 
Xi can reach on G^[B). Since Xi and all the vertices on this path were still part 
of the graph when Xj was removed in step II.A, then according to step I LA. 2, 
charfxj) was added to R{xi). Thus, the assignment of charfxj) to Xi is feasible. 

Next, consider the vertices that are assigned a value by rule 2. Every vertex 
that is removed in step LB or II.A is clearly assigned a value by rule 1. All the 
other vertices are communally assigned a value in step 11. b. In particular, the 
vertices that do not have a path to an individually assigned vertex are assigned 
such a value. Thus, the two steps of the assignment procedure are feasible. □ 
Claim b ip is satis abld ip is satisfiable over R, 

Proof: By claims 3 and 4, R is satisfactory for A=UA^. Consequently, by claim 
2 is adequate for ^(At(cp)), and in particular R is adequate for d>{(p). Thus, 
by the definition of adequacy, (p is satisfiable iff p is satisfiable over R, □ 

6 Improvements of the Basic Algorithm 

There are several improvements to the basic algorithm, which can significantly 
decrease the size of the resulting state-space. Here, we present some of them. 

6.1 Coloring 

Step II.A.l of the basic algorithm calls for allocation of distinct characteristic 
values to the mixed vertices. This is not always necessary, as we demonstrate in 
the following small example. 

Example Consider the subgraph {ui, /i, /2, U2} from the graph of Fig. 1. 
Application of the basic algorithm to this subgraph may yield the following 
allocation, where the assigned characteristic values are underlined: R\ \ u\ ^ 
^ {0)7/2 {071)7^2 {O7I73}. This allocation leads to a state- 

space complexity of 12. 

By relaxing the requirement that all individually assigned characteristic va- 
lues should be distinct, we can obtain the allocation R2 : ui {072)7/1 
{0)7/2 {0)7^2 1-^ {O7I) with a state-space complexity of 4. 

It is not difficult to see that R2 is adequate for the considered subgraph. □ 
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We will now explore some conditions under which the requirement of distinct 
individually assigned values can be relaxed while maintaining adequacy of the 
allocation. 

Assume that the mixed vertices are assigned their individual characteristic 
values in the order xi, . . . , Assume that we have already assigned individual 
char values to Xi, . . . ,x^_i and are about to assign a char value to x^. What 
may be the reasons for not assigning to x^ the value of char[xi) for some i < r? 
Examining our assignment procedure, such an assignment may lead to violation 
of the i^-constraints only if there exists a path of the form: 

yp , yp , ry^ 1 rp 

— — — ... — — — ^ _ _ _ ... _ _ _ 

where for every individually assigned vertex Xp on the G_-path from Xi to Xj 
(including x^), i < p, and equivalently for every vertex Xq on the G_-path from 
Xy. to Xj. (including x/^), r < g. 

This observation is based on the way the assignment procedure works: it 
assigns to all vertices in a connected G^[B) component the characteristic value 
of the mixed vertex with the lowest index. Thus, if there exists a vertex Xp on 
the path from Xi to Xj s.t. p < then Xj will not be assigned the value char[xi). 
Consequently, there is no risk that the assignment procedure will assign Xj and 
Xk the same value, even if the characteristic values of x^ and Xy are equal. 

We refer to vertices that have such a path between them as being incorapatihle 
and assign them different characteristic values. 

Assigning Values to Mixed Vertices with Possible Duplication. 

To allow duplicate characteristic values, we add the following as step I.C of the 
algorithm. 

1. Predetermine the order xi, . . . , x^^, by which individually assigned variables 
will be allocated their characteristic values. 

2. Construct an incompatihility graph whose vertices are xi, . . . ,x^^ and 

there is an edge connecting x^ to x^ iff Xi and Xy are incompatible. 

3. Find a minimal coloring for i.e. assign values (‘colors’) to the vertices 

of s.t. no two neighboring vertices receive the same value. Due to the 

preprocessing step, we require that each connected component is colored 
with a unique ‘pallet’ of colors. 

Step II. A. 1 should be changed as follows: 

1. Choose a mixed vertex x^. Add to R{xi) the color Ci that was determined in 
step I.C. 3 as the characteristic value of x^. 

Like the case of minimal vertex covering, step 3 calls for the solution of the 
NP-hard problem of minimal coloring. In a similar way, we resolve this diffi- 
culty by applying one of the approximation algorithms (e.g. one of the “greedy” 
algorithms) for solving this problem. 

Example 5, Once more, let us consider the subgraph {ui, /i, / 2 , U 2 } of Fig. 1. 
The modified version of the algorithm identifies the order of choosing the mi- 
xed vertices as /i,/ 2 - The incompatibility graph for this ordering simply 
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consists of the two vertices /i and /2 with no edges. This means that we can 
color them by the same color, leading to the allocation R 2 : Ui {d, 2 },/i 1 -^ 
{Q}:f 2 ^ presented in Example 4. 

For demonstration purposes, assume that all four vertices in this component 
were connected by additional edges to other vertices, and that the removal order 
of step II. A was determined to be : /i, The resulting is depicted 

in Fig. 2(a). By the definition of every two vertices connected on this 

graph must have different characteristic values. For example fi and U 2 cannot 
have the same characteristic value because G[B) can consist of both the solid 
edge (/ 2 , ^ 2 ) and the dashed edge (/i, / 2 ) (in the original graph). Since according 
to the assignment procedure the value we assign to /i and /2 is determined by 
char[fi)^ it must be different than char[u 2 ). 

Since this graph can be colored by two colors, say, /i and /2 colored by 0, 
while Ui and U 2 colored by 1 , we obtain the allocation : Ui 1 -^ 

{ d },/2 { d }, u ^2 {^^ 1 } ^ 




Fig. 2. (a) The Graph (b) Illustrating selective allocation 



6.2 Selective Assignments of Characteristic Values in Step II. B 

Step II. B of the basic algorithm requires an unconditional assignment of a fresh 
characteristic value to each remaining connected component. This is not 
always necessary, as shown by the following example. 

Example 6, Consider the graph G presented in Fig. 2(b). Applying the Range 
Allocation algorithm to this graph can yield the ordering /i, /2 and consequently 
the allocation R 4 : wi 1 -^ {0,3}, /i 1 -^ {Q }:/2 {O 7 I 72 } with 

complexity 12 (although by the coloring procedure suggested in the previous 
sub-section u\ and /2 can have the same characteristic value, it will not reduce 
the state-space in this case). 

Our suggestion for improvement will identify that, while it is necessary to add 
the characteristic value ‘3’ to R(ui), the addition of ‘2’ to R{u 2 ) is unnecessary, 
and the allocation R 5 : 1 -^ {0?3},/i 1 -^ { 0},/2 {^ 7 1 } with 

complexity 8 is adequate for the graph of Fig. 2(b). □ 

Assume that G= is a remaining connected G^ component with no mixed vertices, 
and let K = values that are common to the allocations 

of all vertices in G= (in fact, it can be proven that for all x G C=, R{x) is equal). 
Let yi, . . . , ^ G= be all the vertices which are G^-neighbors of vertices in G=. 

The following condition is sufficient for not assigning the vertices of G= a fresh 
characteristic value: 

k 

Condition Con: A: < |A|, or A — U R{y^) ^ 0 . 

i=l 
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Note that when condition Con holds, there is always a value in K which is 
different from the values Z/i, . . . , Z/A;. 

For example, when we consider the component {^ 2 } in the graph of Fig. 2(b), 
we have that K = {0,1} with \K\ = 2, while {U 2 } has only one t7^-neighbor: 
/ 2 - Consequently, we can skip the assignment of the fresh value ‘2’ to U 2 - 
Therefore, we modify step ILB of the basic algorithm to read as follows: 

B. For each (remaining) connected component C=, if condition Con does 
not hold, add a common fresh value uc^ to R{xk)^ for every Xk G C=, 

A more general analysis of these situations is based on solving a set- covering 
problem (or approximations thereof) for each invocation of step ILB (more de- 
tails are provided in [PRSS98]). Experimental results have shown that due to 
this analysis, in most cases step ILB is not activated. Furthermore, condition 
Con alone identifies almost all of these cases without further analysis. 

6.3 An Upper Bound 

We present an upper bound for the size of the state-space, as computed by 
our algorithm. For a dashed connected component let nj. = \G^\ and let 
rnk = |mucA;| (the number of individually assigned vertices in G^). Also, let tjk 
denote the number of colors needed for coloring these nik vertices (obviously, 
Vk < rrik). 

When calculating the maximum state-space for the component Gg there are 
three groups of vertices to consider: 

1. For every vertex Xi s.t. i <ykj |R(^0I — Altogether they contribute yC or 
less to the state-space. 

2. For every vertex Xi s.t. yu < i C rn^j \R{xi) \ < yj. - Altogether they contribute 

Yqqq state space. 

3. For every vertex Xi s.t. mk < i < nky \R{xi) \ < ykC^- Each of these vertices 
can not have more than yk values when the m/^-th vertex is removed. Then, 
only one more value can be added to their R-set in step II.B (in fact, this 
additional element is rarely added, as was explained in the previous sub- 
section). Altogether these vertices contribute [yk + 1)^^“^^ or less to the 
state-space. 

Combining these three groups, the new upper bound for the state-space is: 

StateSpace < ]4(j/a;!) • • (y* + 1)"'““™'“ (2) 

k 

The worst case, according to formula (2), is when all vertices are mixed {G^ = 
U^), there is one connected component [uk = n), the minimal vertex cover is 
rnk = m = n — 1 and the chromatic number yk is equal to rnk- Graphically, this 
is a ‘double clique’ (a clique where G^ = G^) which brings us back to n!, the 
upper bound that was previously derived in Section 3. 

7 Experimental Results 

The Range Allocation algorithm proved to be very effective for the application 
of code validation. One of the reasons for this has to do with the process of 
decomposition (described in [PSS99]) which the CVT tool invokes before range 
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allocation. If the right-hand side of the implication we try to prove is a conjunc- 
tion of rn clauses, then this process decomposes the implication up to rn separate 
formulas. Each of these formulas consists of one clause in the right-hand side, 
and the cone of influence on the left (this is the portion of the formula in the 
left-hand side that is needed for proving the chosen clause on the right). This 
process often leads to highly unbalanced comparison graphs: is relatively 

large (all the comparisons on the left-hand side with positive polarity belong to 
this graph) and is very small, resulting in a relatively small number of mixed 
vertices. These types of graphs result in very small ranges, and many times a 
large number of variables receive a single value in their range and thus become 
constants. We have many examples of formulas containing 150 integer variables 
or more (which, using the [l..n] range, results in a state-space of 150^^*^), which 
after performing the Range Allocation algorithm, can be proved in less than a 
second with a state-space of less than 100. In most cases, these graphs are made 
of many unconnected G^ components with a very small number of G^ edges. 

We used CVT to validate an industrial size program, a code generated for 
the case study of a turbine developed by SNECMA[Con95]. The program was 
partitioned manually (by snecma) into 5 modules which were separately com- 
piled. Altogether the specification of this system is a few thousand lines long 
and contains more than 1000 variables. After the abstraction we had about 2000 
variables. Following is a summary of the results achieved by CVT: 



Module 


Gonjuncts 


Time (min.) 


Ml 


530 


1:54 


M2 


533 


1:30 


M3 


124 


0:27 


M4 


308 


2:22 


M5 


860 


5:55 


Total : 


2355 


12:08 



The figures for module M5 are only an estimate because the decomposition has 
been performed manually rather than automatically. 

We also tried to conduct a comparative study with [SGZ+98]. Although we 
had the same input files (the comparison between pipelined and non-pipelined 
microprocessors, as originally suggested by Burch and Dill [BD94]) as they did, 
it was nearly impossible to compare the results on this specific example, because 
of several reasons, the most significant of which were that all the examples 
considered in [SGZ+98] were solvable in fragments of a second by both methods, 
and also led to comparable sizes bed’s. 

We predict that a comparison on harder problems will reveal that the two 
methods are complementary. While the Boolean encoding method is efficient 
when there is a small number of comparisons, the Range Allocation algorithm 
is more efficient when there is a small number of mixed vertices. 

8 Conclusions and Directions for Future Research 

We presented the Range Allocation method, which can be used as a decision 
procedure based on finite instantiations, when validating formulas of the theory 
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of equality. This method proved to be highly effective for validating formulas 
with a large number of integer and float variables. 

The method is relatively simple and easy to implement and apply. There is 
no need to rewrite the verified formula, and any satisfiability checker can be used 
as a decision procedure. 

The algorithm described in this paper is a simplified version of the full Range 
Allocation algorithm implemented in the CVT tool. The full algorithm includes 
several issues that were not discussed here mainly due to lack of space. A more 
comprehensive description of the algorithm can be found in [PRSS98]. 

The Range Allocation algorithm can be improved in various ways. For ex- 
ample, the rave set is not unique, and the problem of choosing among rave sets 
that have an equal size is still an open question. Furthermore, given an rave set, 
the ordering in which the vertices in this set are removed in stage Il/a should 
also be further investigated. Another possible improvement is the identifica- 
tion of special kind of graphs. For example, the range [1..4] is enough for any 
planar graph (where = G^). R should be rather interesting to investigate 
whether Teal-life’ formulas have any special structure which can then be solved 
by utilizing various results from graph theory. 

Another possibility for future research is to extend the algorithm to formulas 
with less abstraction, and more speciflcally to formulas including the > and > 
relations. 
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Abstract. In using the logic of equality with unininterpreted functions to verify 
hardware systems, specific characteristics of the formula describing the correctn- 
ess condition can be exploited when deciding its validity. We distinguish a class of 
terms we call “p-terms” for which equality comparisons can appear only in mo- 
notonically positive formulas. By applying suitable abstractions to the hardware 
model, we can express the functionality of data values and instruction addresses 
flowing through an instruction pipeline with p-terms. 

A decision procedure can exploit the restricted uses of p-terms by considering only 
“maximally diverse” interpretations of the associated function symbols, where 
every function application yields a different value except when constrained by 
functional consistency. We present a procedure that translates the original formula 
into one in propositional logic by interpreting the formula over a domain of fixed- 
length bit vectors and using vectors of propositional variables to encode domain 
variables. By exploiting maximal diversity, this procedure can greatly reduce the 
number of propositional variables that must be introduced. 

We present experimental results demonstrating the efficiency of this approach 
when verifying pipelined processors using the method proposed by Burch and 
Dill. Exploiting positive equality allows us to overcome the exponential blow-up 
experienced previously [VB98] when verifying microprocessors with load, store, 
and branch instructions. 



1 Introduction 

For automatically reasoning about pipelined processors, Burch and Dill demonstrated 
the value of using propositional logic, extended with uninterpreted functions, uninterpre- 
ted predicates, and the testing of equality [BD94]. Their approach involves abstracting 
the data path as a collection of registers and memories storing data, units such as ALUs 
operating on the data, and various connections and multiplexors providing methods for 
data to be transferred and selected. The operation of units that transform data is abstrac- 
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ted as blocks computing functions with no specified properties other than functional 
consistency, i.e., that applications of a function to equal arguments yield equal results: 
X = y implies f{x) = f{y). The state of a register at any point in the computation can 
be represented by a symbolic term, an expression consisting of a combination of domain 
variables, function and predicate applications, and Boolean operations. 

The correctness of a pipelined processor can be expressed as a formula in this logic 
that compares for equality the terms describing the results produced by the processor to 
those produced by an instruction set reference model. In their paper, Burch and Dill also 
describe a decision procedure for their logic based on theorem proving search methods. 
It uses combinatorial search coupled with algorithms for maintaining a partitioning of 
the terms into equivalence classes based on the equalities that hold at a given step of the 
search. 

Burch and Dill’s work has generated considerable interest in the use of uninterpreted 
functions to abstract data operations in processor verification. A common theme has 
been to adopt Boolean methods, either to allow integration of uninterpreted functions 
into symbolic model checkers [DPR98,BBCZ98], or to allow the use of Binary Decision 
Diagrams in the decision procedure [HKGB97,GSZAS98,VB98]. Boolean methods al- 
low a more direct modeling of the control logic of hardware designs and thus can be 
applied to actual processor designs rather than highly abstracted models. In addition 
to BDD-based decision procedures. Boolean methods could use some of the recently 
developed satisfiability procedures for propositional logic. In principle, Boolean me- 
thods could outperform decision procedures based on theorem proving search methods, 
especially when verifying processors with more complex control logic. 

Boolean methods can be used to decide the validity of a formula containing terms and 
uninterpreted functions by exploiting the property that a given formula contains a limited 
number of function applications and therefore can be proved to be universally valid by 
considering its interpretation over a sufficiently large, but finite domain [Ack54]. The 
formula to be verified can be translated into one in propositional logic, using vectors of 
propositional variables to encode the possible values generated by function applications 
[HKGB97]. Our implementation of such an approach [VB98] as part of a BDD-based 
symbolic simulation system was successful at verifying simple pipelined data paths. We 
found, however, that the computational resources grew exponentially as we increased 
the pipeline depth. Modeling the interactions between successive instructions flowing 
through the pipeline, as well as the functional consistency of the ALU results, precludes 
having an ordering of the variables encoding term values that yields compact BDDs. 
Similarly, we found that extending the data path to a complete processor by adding either 
load and store instructions or instruction fetch logic supporting jumps and conditional 
branches led to impossible BDD variable ordering requirements. Goel et al [GSZAS98] 
presented an alternate approach to using BDDs to decide the validity of formulas in the 
logic of equality with uninterpreted functions. They use Boolean variables to encode the 
equality relations between terms, rather than to encode term values. Their experimental 
results were also somewhat disappointing. To date, the possibility that Boolean methods 
could outperform theorem proving methods has not been realized. 
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In this paper, we show that the characteristics of the formulas generated when mode- 
ling processor pipelines can be exploited to greatly reduce the number of propositional 
variables that are introduced when translating the formula into propositional logic. We 
distinguish a class of terms we call p-terms for which equations, i.e., equality compari- 
sons between terms, can appear only in monotonically positive formulas. Such formulas 
are suitable for describing the top-level correctness condition, but not for modeling any 
control decisions in the hardware. By applying suitable abstractions to the hardware 
model, we can express the functionality of data values and instruction addresses with 
p-terms. 

A decision procedure can exploit the restricted uses of p-terms by considering only “ma- 
ximally diverse” interpretations of the associated “p-function” symbols, where every 
function application yields a different value except when constrained by functional con- 
sistency. In translating the formula into propositional logic, we can then use vectors with 
fixed bit patterns rather than propositional variables to encode the possible results of fun- 
ction applications. This reduction in variables greatly simplifies the BDDs generated, 
avoiding the exponential blow-up experienced by other procedures. 

Others have recognized the value of restricting the testing of equality when modeling 
the flow of data in pipelines. Berezin et al [BBCZ98] generate a model of an execution 
unit suitable for symbolic model checking in which the data values and operations 
are kept abstract. In our terminology, their functional terms are all p-terms. They use 
fixed bit patterns to represent the initial states of registers, much as we replace p-term 
domain variables by fixed bit patterns. To model the outcome of each program operation, 
they generate an entry in a “reference file” and refer to the result by a pointer to this 
file. These pointers are similar to the bit patterns we generate to denote the p-function 
application outcomes. Damm et al consider an even more restricted logic that allows 
them to determine the universal validity of a formula by considering only interpretations 
over the domain {0, 1}. Verifying an execution unit in which the data path width is 
reduced to a single bit then suffices to prove its correctness for all possible widths. In 
comparison to these other efforts, we maintain the full generality of the unrestricted 
functional terms of Burch and Dill while exploiting the efficiency gains possible with p- 
terms. In our processor model, we can abstract register identifiers as unrestricted terms, 
while modeling program data and instruction data as p-terms. In contrast, both [BBCZ98] 
and [DPR98] used bit encodings of register identifiers and were unable to scale their 
verifications to a realistic number of registers. 

In a different paper in this proceedings, Pnueli, et al [PRSS99] also propose a method 
to exploit the polarity of the equations in a formula containing uninterpreted functions 
with equality. They describe an algorithm to generate small domains for each domain 
variable such that the universal validity of the formula can be determined by considering 
only interpretations in which the variables range over their restricted domains. A key 
difference of their work is that they examine the equation structure after replacing all 
function application terms with domain variables and introducing functional consistency 
constraints as described by Ackermann [Ack54]. These consistency constraints typically 
contain large numbers of equations — far more than occur in the original formula — that 
mask the original p-term structure. In addition, we use a new method of replacing function 
application terms with domain variables. Our scheme allows us to exploit maximal 
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diversity by assigning fixed values to the domain variables generated while expanding 
p-function application terms. 

In the remainder of the paper, we first define the syntax and semantics of our logic by 
extending that of Burch and Dill’s. We prove our central result concerning the need to 
consider only maximally diverse interpretations when deciding the validity of formulas 
in our logic. We describe a method of translating formulas into propositional logic. We 
discuss the abstractions required to model processor pipelines in our logic. Finally, we 
present experimental results showing our ability to verify a simple, but complete pipeli- 
ned processor. A more detailed presentation with complete proofs is given in [BGV99]. 



2 Logic of Equality with Uninterpreted Functions (EUF) 

The logic of Equality with Uninterpreted Functions (EUF) presented by Burch and Dill 
[BD94] can be expressed by the following syntax: 

term ::= ITEiformula, term, term) 

I function-symbol {term , . . . , term) 
formula ::= true | false | {term = term) 

I {formula A formula) \ {formula \/ formula) \ -^formula 
I predicate-symbolferm , . . . ,term) 

In this logic, formulas have truth values while terms have values from some arbitrary 
domain. Terms are formed by applications of uninterpreted function symbols and by 
applications of the ITE (for “if-then-else”) operator. The ITE operator chooses between 
two terms based on a Boolean control value, i.e., /TE(true, xi,X2) yields x\ while 
/TE( false, x\,X 2 ) yields ^2 . Formulas are formed by comparing two terms for equality, 
by applying an uninterpreted predicate symbol to a list of terms, and by combining 
formulas using Boolean connectives. A formula expressing equality between two terms 
is called an equation. We use expression to refer to either a term or a formula. 

Every function symbol / has an associated order, denoted ord ( /) , indicating the number 
of terms it takes as arguments. Function symbols of order zero are referred to as domain 
variables. We use the shortened form v rather than t?() to denote an instance of a domain 
variable. Similarly, every predicate p has an associated order ord{p). Predicates of order 
zero are referred to as propositional variables. 

The truth of a formula is defined relative to a nonempty domain V of values and an 
interpretation 1 of the function and predicate symbols. Interpretation 1 assigns to each 
function symbol of order k a function from to V, and to each predicate symbol of 
order k a function from to {true, false}. Given an interpretation / of the function 
and predicate symbols and an expression E, we can define the valuation of E under I, 
denoted I [E] , according to its syntactic structure. I [E] will be an element of the domain 
when is a term, and a truth value when is a formula. 

A formula F is said to be true under interpretation I when I[F] equals true. It is said 
to be valid over domain V when it is true for all interpretations over domain V. E is 
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said to be universally valid when it is valid over all domains. It can be shown that if a 
formula is valid over some suitably large domain, then it is universally valid [Ack54]. 
In particular, it suffices to have a domain as large as the number of syntactically distinct 
function application terms occurring in F. 



3 Logic of Positive Equality with Uninterpreted Functions (PEUF) 

3.1 Syntax 

PEUF is an extended logic based on EUF given by the following syntax: 

g-term ::= ITE (formula ^ g-term^ g-term) 

I g-function-symbol{p-term , . . . ,p-term) 
p-term ::= g-term \ ITE formula, p-term,p-term) 

I p-function-symbol{p-term, . . . , p-term) 
formula ::= true | false | (term = term) 

I formula A formula) \ formula W formula) \ -^formula 
I predicate-symbol{p-term, . . . , p-term) 
p-formula v= formula \ (p-term = p-term) 

I (p-formula A p-formula) \ (p-formula V p-formula) 

This logic has two disjoint classes of function symbols giving two classes of terms. 
General terms, or g-terms, correspond to terms in EUF. Syntactically, a g-term is a g- 
function application or an ITE term in which the two result terms are hereditarily built 
from g-function applications and /TEs. 

The new class of terms is called positive terms, or p-terms. P-terms may not appear in 
negative equations, i.e., equations within the scope of a logical negation. The syntax is 
restricted in a way that prevents p-terms from appearing in negative equations. When 
two p-terms are compared for equality, the result is a special, restricted kind of formula 
called a p-formula. P-formulas are built up using only the monotonically positive Boolean 
operations A and V. P-formulas may not be placed under a negation sign, and cannot be 
used as the control for an ITE operation. 

Note that our syntax allows any g-term to be “promoted” to a p-term. Throughout the syn- 
tax definition, we require function and predicate symbols to take p-terms as arguments. 
However, since g-terms can be promoted, the requirement to use p-terms as arguments 
does not restrict the use of g-function symbols or g-terms. In essence, g-function symbols 
may be used as freely in our logic as in EUF, but the p-function symbols are restricted. 

Observe that PEUF does not extend the expressive power of EUF — we could translate 
any PEUF expression into EUF by considering the g-terms and p-terms to be terms and 
the p-formulas to be formulas. Instead, the benefit of PEUF is that by distinguishing 
some portion of a formula as satisfying a restricted set of properties, we can radically 
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reduce the number of different interpretations we must consider when proving that a 
p-formula is universally valid. 



3.2 Diverse Interpretations 

Let T be a set of terms, where a term may be either a g-term or a p-term. We classify 
terms as either p-function applications, g-function applications, or ITE terms, according 
to their top-level operation. The first two categories are collectively referred to as function 
application terms. For any formula or p-formula F, define T (F) as the set of all function 
application terms occurring in F. 

An interpretation 1 partitions a term set T into a set of equivalence classes, where 
terms and T 2 are equivalent under i, written T 2 when i[l\] equals 1 [F 2 \- 

Interpretation F is said to be a refinement of / for term set T when T\ 7 2 implies 
Ti T 2 for every pair of terms 7 \ and T 2 in T. iMs a proper refinement of 1 for 
T when it is a refinement and there is at least one pair of terms T 1 /F 2 e T such that 
7 1 72, but 7 1 72. 

Let S denote a subset of the function symbols in formula F. An interpretation 7 is said 
to be diverse for F with respect to F when it provides a maximal partitioning of the 
function application terms in T [F) having a top-level function symbol from F relative 
to each other and to the other function application terms, but subject to the constraints 
of functional consistency. That is, for T\ of the form . . . ^Sk), where / G F, an 
interpretation 7 is diverse with respect to F if 7 has 7 \ 7 ’2 only in the case where T 2 

is also a term of the form /(Fi, . . . , F/^), and Si Ui for all i such that 1 < i < A:. 
If we let Fp{F) denote the set of all p-function symbols in F, then interpretation 7 is 
said to be maximally diverse when it is diverse with respect to Fp{F). Note that this 
property requires the p-function application terms to be in separate equivalence classes 
from the g-function application terms. 

Theorem 1. P-formula L' is universally valid if and only if it is true in all maximally 
diverse interpretations. 

First, it is clear that if F is universally valid, then F is true in all maximally diverse 
interpretations. We prove via the following lemma that if L' is true in all maximally 
diverse interpretations it is universally valid. 

Lemma 1. If interpretation I is not maximally diverse for p-formula F, then there is 
an interpretation F that is a proper refinement ofl such that F[P'] ^ 7 [Pf 

Proof Sketch: Let 7i be a term occurring in P' of the form fi{Si, . . . , Skfi, where fi 
is a p-function symbol. Let T 2 be a term occurring in F of the form / 2 (Fi, . . . , FajJ, 
where /2 may be either a p-function or a g-function symbol. Assume furthermore that 
7[7i] = 7 [7 2 ] = 2 T, but that either symbols fi and /2 differ or 7[S'^] ^ 7[F^] for some 
value of i. 

Let z' be a value not in F, and define a new domain = F u {z'}. Our strategy is 
to construct an interpretation F over F^ that partitions the terms in T {F) in the same 
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way as /, except that it splits the class containing terms and 7 2 into two parts — one 
containing and evaluating to and the other containing T 2 and evaluating to z. 

Define function h:V' ^ V to map elements of V' back to their counterparts in V, i.e., 
h(z') = z, while all other values of x give h{x) = x. 

For p-function symbol /i, define i^(/i)(xi, . . . ,x:k) as z^ when h{xi) = l[Si] for 
all 1 < i < A:i, and as /(/i)(/i(xi), . . . ^h{xk)) otherwise. For other function and 
predicate symbols, I' is defined to preserve the functionality of interpretation /, while 
also treating argument values of z^ the same as z. That is, l^{f) for function symbol / 
having ord{f ) = k is defined such that r{f){xi, . . . , Xk) = I{f){h{xi), . . . , h{xk)). 

One can show that interpretation V maintains the values of all formulas and g-terms as 
occur under interpretation I. Some of the p-terms that evaluate to z under /, including 
Ti, evaluate to zk Others, including 7 ' 2 , continue to evaluate to z. With respect to p- 
formulas, consider first an equation of the form Sa = St where Sa and St are p-terms. The 
equation will yield the same value under both interpretations except under the condition 
that Sa and St are split into different parts of the class that originally evaluated to z, 
in which case the comparison will yield true under 7, but false under 7^ In any case, 
we maintain the property that r[Sa = ^ I[Sa = St]. This implication relation is 

preserved by conjunctions and disjunctions of p-formulas, due to the monotonicity of 
these operations. By this argument we can see that 7 Ms a proper refinement of 7 for 
T (F) and that F[F] ^ I[F]. □ 

Theorem 1 is proved by repeatedly applying Lemma 1. One can show that any inter- 
pretation 7 of a p-formula F can be refined to a maximally diverse interpretation 7* for 
F such that 7 * [F] implies 7 [F] . It follows that the truth of F for all maximally diverse 
interpretations implies its truth for all possible interpretations. 



4 Exploiting Positive Equality in a Decision Procedure 

A decision procedure for PEUF must determine whether a given p-formula is universally 
valid. Theorem 1 shows that we can consider only interpretations in which the values 
produced by the application of any p-function symbol differ from those produced by the 
applications of any other p-function or g-function symbol. We can therefore consider 
the different p-function symbols to yield values over domains disjoint from one another 
and from the domain of g-function values. In addition, we can consider each application 
of a p-function symbol to yield a distinct value, except when its arguments match those 
of some other application. 

We describe a decision procedure that first transforms an arbitrary EUF formula into one 
containing only domain and propositional variables. This restricted class of formulas 
can readily be translated into propositional formulas by using bit vectors as the domain 
of interpretation. The transformation can exploit positive equality by using fixed bit 
patterns rather than vectors of propositional variables to encode the domain variables 
representing p-function application results. 
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4.1 Eliminating Function and Predicate Applications in EUF 

We illustrate our method by considering the formula 

x = y => g{f{x))=g{f{y)) (1) 

Eliminating the implication gives ->(x = y) V g{f{x)) = g{f{y)), and hence both / 
and g are a p-function symbols, while x and y are g-function symbols. We introduce 
domain variables vfi/vf 2 and replace term /(x) with vf i and term f[y) with the 
term ITE{y = x, vf x/ 2 ). Observe that as we consider interpretations with different 
values for variables vf i and x/2, we implicitly cover all values that an interpretation 
of function symbol / may yield for arguments x and y. The ITE structure enforces 
functional consistency — when /(x) = /(y), we will have both terms evaluate io I{vf f). 

These replacements give a formula: -i(x = y) V g{vf {) = g{ITE{y = x, vf i, x/2)). We 
then introduce domain variables vg-^ and x^2 replace the first application of g with 
vgi, and the second with ITE{ITE{y = x, vf^, x/2) = vf^, vg-^, x^2)* Our final form is 
then: 

-,(x = y) V vg-^^ITE{ITE{y^x,vf-^,vf2) = vf-^,vg-^,vg2) (2) 

The complete procedure generalizes that shown for the simple example. Suppose for- 
mula F contains n syntactically distinct terms 7 i /i 2 , . . . /ik having the application of 
function symbol / as the top-level operation. We refer to these as /-application terms. 
We introduce domain variables , . . . , vf^ and replace each term 7 • with a nested ITE 
structure Ui of the form 



U, = ITE{Q^,,vf,JTE{C,^2:Vf2X-ITE{C,^,_^^ • • •)) 

where the formula Cij is true iff the arguments to the top-level application of / in the 
terms 7 • and Tj have the same values. The result of replacing every /-application term 
Ti in F by the new term is a formula that we call F^^\ 

We remove all function symbols of nonzero order from F by repeating this process. A 
similar process is used to eliminate applications of predicate symbols having nonzero 
order, except that we introduce propositional variables px 2 , . . . , when replacing 
applications of predicate symbol p. We call the final result of this process the formula 
F*. Complete details are presented in [BGV99]. 

Theorem 2. For EUF formula F, the transformation process yields a formula F"^ con- 
taining only domain and propositional variables and such that F is universally valid if 
and only if F"" is universally valid. 

Proof Sketch: To prove this theorem, we first show that our procedure for replacing all 
instances of function symbol / in an arbitrary formula G by nested ITE terms to yield 
a formula preserves universal validity. (1) G^^^ universally valid ^ G univer- 
sally valid. For any interpretation 7 of the function and predicate symbols in G, we can 
construct an interpretation 7 of the symbols in G^^^ such that I[G^^^] = I[G]. Inter- 
pretation 7 is defined by extending 7 to include interpretations of the domain variables 




478 



R.E. Bryant, S. German, and M.N. Velev 



vfi^ . . . , vf^. Each such variable vf^ is given the interpretation /(v/J = I[Ti], i.e., the 
value of /-application term i under 1 . 

(2) Conversely, G universally valid ^ universally valid. For any interpretation I 
of the function and predicate symbols in G^'^\ we can define an interpretation 1 of the 
symbols in G such that I[G] = This interpretation is defined by introducing an 

interpretation of function symbol / such that the values yielded when evaluating each 
/-application term 7 • under 1 matches that yielded for the nested ITE structure Ui under 
1 . 

By a similar process we show that our procedure for replacing predicate applications 
preserves universal validity. The theorem is then proved by inducting on the number of 
function and predicate symbols. □ 

4.2 Using Fixed Values for P-Function Applications 

We can exploit the maximal diversity property by using fixed domain values rather than 
domain variables when replacing p-function applications by nested ITE terms. First, 
consider the effect of replacing all instances of a function symbol / by nested ITE terms, 
as described earlier, yielding a formula with new domain variables t/^, . . . , vf^. 

Lemma 2. If f G F, then for any interpretation I that is diverse for F with respect 
to F, there is an interpretation I that is diverse for F^^^ with respect to F — {f} U 
{vfi, ■■■, v/„} such that I[F] = 

Proof Sketch: The proof of this lemma requires a more refined argument than that of 
Theorem 2. If we were to define 7 (t/ J to be 7 [7^] for each domain variable t/^, we may 
not have a diverse interpretation with respect to the newly-generated variables. Instead, 
we define 7(t/J to be 7[71] only if there is no value j < i such that the arguments of 
/-application terms Tj and Ti have equal valuations under 7. Otherwise we let be a 
value not in 77, define a new domain 77^ = 77 u {z'}, and let 7(t/ J = z'. It can readily 
be seen that the value assigned to this variable will not affect the valuation of nested ITE 
structure Ui under interpretation 7, and hence it can be arbitrary. □ 

Suppose we apply the transformation process of Theorem 2 to a p-formula F to generate 
a formula F*, and that in this process, we introduce a set of new domain variables V to 
replace the applications of the p-function symbols. Let Fp{F) be the union of the set of 
domain variables in Ff{F) and V. That is, Fp{F) consists of those domain variables 
in the original formula F that were p-function symbols as well as the domain variables 
generated when replacing applications of p-function symbols. Let Fg{F) be the domain 
variables in T'* that are not in F^{F). These variables were either g-function symbols 
in F or were generated when replacing g-function applications. 

We first observe that we can generate all maximally diverse interpretations of F by 
considering only interpretations of the variables in T'* that assign distinct values to the 
variables in F*{F): 

Theorem 3. PEUF formula F is universally valid if and only if its translation T'* is 
true for every interpretation 7* such that ifvp is a variable in 27* (F) and v is any other 
domain variable in F*, then F[vp) F{v). 
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Proof Sketch: This theorem follows by inducting on the number of p-function symbols 
in P\ using Lemma 2 to prove the induction step. □ 

Observe that the nested ITE structures we generate when replacing function applications 
involve many equations in the formulas controlling ITE operations. These can cause 
function symbols that appeared as p-function symbols in the original formula to be g- 
function symbols in F* . In addition, many of the newly-generated variables will not be 
p-function symbols in L'*. For example, variables vf and v/2 are g-function symbols 
in Equation 2. Nonetheless, this theorem shows that we can still restrict our attention to 
interpretations that are diverse with respect to these variables. 

Furthermore, we can choose particular domains of sufficient size and assign fixed inter- 
pretations to the variables in F* ( F) . Select disjoint domains Fp and Vg for the variables 
in Fp(F) and F*(F), respectively, such that \D^\ > |Fp(F)| and \Vg\ > \Fg{F)\. Let 
a be any 1-1 mapping a: F*(F) ^ Vp. 

Corollary 1. PEUE formula F is universally valid if and only if its translation F* is true 
for every interpretation F such that F[vp) = a[vp) for every variable Vp in 
and F{vg) is in Vgfor every variable Vg in F*(F). 

Proof Sketch: Any interpretation that is diverse with respect to F^{F) defines a 1-1 
mapping from the variables in F* (L ) to the domain. We can therefore find an isomorphic 
interpretation satisfying the requirements for F listed above. □ 

As an illustration, consider formula L'* given by Equation 2 resulting from the transfor- 
mation of formula F given by Equation 1. We have F^{F) = {vf^, v/2, ^^2} 

Fg{F) = {x, y}. Suppose we use bit vectors of length 3 as the domain of interpreta- 
tion. Then we could let Vg be {(0, 0, 0), (0, 0, 1)}. We assign X the fixed interpretation 
(0, 0, 0), and y the interpretation (0, 0, a) where a is a propositional variable. Viewing 
truth values true and false as representing bit values 1 and 0, respectively, the different 
interpretations of a will then cover both the case where x and y have equal interpretations 
as well as where they are distinct. For variables vf-^, v/2, and vg2, we can assign 
fixed interpretations (1,0, 0), (1,0, 1), (1, 1,0), and (1,1, 1), respectively. Thus, we can 
translate our formula F into a propositional formula having just a single propositional 
variable. 

Ackermann also describes a scheme for replacing function application terms by domain 
variables [Ack54]. Using his scheme, we simply replace each instance of a function 
application by a newly-generated domain variable and then introduce constraints ex- 
pressing functional consistency. For the example formula given by Equation 1 we would 
get a modified formula: 

{{x = y => w/i = w/2) A (w/i = t>/2 ^ vg-^ = vg2)) 

{x = y ^ vg-^ = vg2) 

Observe, however, that there is no clear way to exploit the maximal diversity property 
with this translated form. If we replace vf i and v/2 by distinct values in the above case, 
we fail to consider any interpretations in which arguments x: and y have equal values. 
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5 Modeling Microprocessors in PEUF 

Our interest is in verifying pipelined microprocessors, proving their equivalence to an 
unpipelined instruction set architecture model. We use the approach pioneered by Burch 
and Dill [BD94] in which the abstraction function from pipeline state to architectural 
state is computed by symbolically simulating a flushing of the pipeline state and then 
projecting away the state of all but the architectural state elements, such as the register 
file, program counter, and data memory. Operationally, we construct two sets of p-terms 
describing the final values of the state elements resulting from two different symbolic 
simulation sequences — one from the pipehne model and one from the instruction set 
model. The correctness condition is represented by a p-formula expressing the equality 
of these two sets of p-terms. 

Our approach starts with an RTL or gate-level model of the microprocessor and performs 
a series of abstractions to create a model of the data path using terms that satisfy the 
restrictions of PEUF. Examining the structure of a pipelined processor, we find that the 
signals we wish to abstract as terms can be classified as either program data, instruction 
addresses, or register identifiers. By proper construction of the data path model, both 
program data and instruction addresses can be represented as p-terms. Register identi- 
fiers, on the other hand, must be modeled as g-terms, because their comparisons control 
the stall and bypass logic. The remaining control logic is kept at the bit level. 

In order to generate such a model, we must abstract the operation of some of the processor 
units. For example, the data path ALU is abstracted as an uninterpreted p-function, 
generating a data value given its data and control inputs. We model the PC incrementer 
and the branch target logic as uninterpreted functions generating instruction addresses. 
We model the branch decision logic as an uninterpreted predicate indicating whether or 
not to take the branch based on data and control inputs. This allows us to abstract away 
the data equality test used by the branch-on-equal instruction. The instruction memory 
can be abstracted as an uninterpreted function, since it is considered to be read-only. 

To model the register file, we use the memory model described by Burch and Dill [BD94], 
creating a nested ITE structure to record the history of writes to the memory. This 
approach requires equations between memory addresses controlling the ITE operations. 
For the register file, such equations are allowed since g-term register identifiers serve 
as addresses. For the data memory, however, the memory addresses are p-term program 
data, and hence such equations cannot be used. Instead, we model the data memory as 
a generic state machine, changing state in some arbitrary way for each write operation, 
and returning some arbitrary value dependent on the state and the address for each read 
operation. Such an abstraction technique is sound, but it does not capture all of the 
properties of a memory. It is satisfactory for modeling processors in which there is no 
reordering of writes relative to each other or relative to reads. 

6 Experimental Results 

In [VB98], we described the implementation of a symbolic simulator for verifying pipe- 
lined systems using vectors of Boolean variables to encode domain variables, effectively 
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treating all terms as g-terms. This simulation is performed direcdy on a modified gate- 
level representation of the processor. In this modified version, we replace all state holding 
elements with behavioral models we call Efficient Memory Models (EMMs). In addition 
all data-transformation elements (e.g., ALUs, shifters, PC incrementers) are replaced by 
read-only EMMs, which effectively implement the transformation of function applica- 
tions into nested ITE expressions. Modifying this program to exploit maximal diversity 
simply involves having the EMMs generate expressions containing fixed bit patterns 
rather than vectors of Boolean variables. All performance results presented here were 
measured on a 125 MHz Sun Microsystems SPARC-20. 

We constructed several simple pipeline processor designs based on the MIPS instruction 
set. We abstract register identifiers as g-terms, and hence our verification covers all 
possible numbers of program registers including the 32 of the MIPS instruction set. The 
simplest version of the pipeline implements ten different Register-Register and Register- 
Immediate instructions. Our program could verify this design in 48 seconds of CPU time 
and just 7 MB of memory using vectors of Boolean variables to encode domain variables. 
Using fixed bit patterns reduces the complexity of the verification to 6 seconds and 2 
MB. 

We then added a memory stage to implement load and store instructions. An interlock 
stalls the processor one cycle when a load instruction is followed by an instruction 
requiring the loaded result. Treating all terms as g-terms and using vectors of Boolean 
variables to encode domain variables, we could not verify this data path, despite running 
for over 2000 seconds. The fact that both addresses and data for the memory come from 
the register file induces a circular constraint on the ordering of BDD variables encoding 
the terms. On the other hand, exploiting maximal diversity by using fixed bit patterns 
for register values eliminates these variable ordering concerns. As a consequence, we 
could verify the 32-bit version of this design in just 12 CPU seconds using 1.8 MB. 

Einally, we verified a complete CPU, with a 5-stage pipeline implementing 10 ALU 
instructions, load and store, and MIPS instructions j (jump with target computed from 
instruction word), jr (jump using register value as target), and beq (branch on equal). 
This design is comparable to the DLX design verified by Burch and Dill in [BD94], 
although our version is closer to an actual gate-level implementation. We were unable 
to verify this processor using the scheme of [VB98]. Having instruction addresses de- 
pendent on instruction or data values leads to exponential BDD growth when modeling 
the instruction memory. Modeling instruction addresses as p-terms, on the other hand, 
makes this verification tractable. We can verify the 32-bit version processor using 169 
CPU seconds and 7.5 MB. 



7 Conclusions 

Eliminating Boolean variables in the encoding of terms representing program data and 
instruction addresses has given us a major breakthrough in our ability to verify pipelined 
processors. Our BDD variables now only encode control conditions and register identi- 
fiers. Eor classic RISC pipelines, the resulting state space is small and regular enough 
to be handled readily with BDDs. 
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We believe that there are many optimizations that will yield further improvements in 
the performance of Boolean methods for deciding formulas involving uninterpreted 
functions. We have found that relaxing functional consistency constraints to allow inde- 
pendent functionality of different instructions, as was done in [DPR98], can dramatically 
improve both memory and time performance. We have devised a variation on the scheme 
of [GSZAS98] for generating a propositional formula using Boolean variables to en- 
code the relations between terms [BGV99]. Our method exploits maximal diversity to 
greatly reduce the number of propositional variables in the generated formula. We are 
also considering the use of satisfiability checkers rather than BDDs for performing our 
tautology checking 
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A Toolbox for the Analysis of Discrete Event 
Dynamic Systems 



Peter Buchholz and Peter Kemper* 

Informatik IV, Universitat Dortmund, D-44221 Dortmund, Germany 



Abstract. We present a collection of tools for functional and quanti- 
tative analysis of discrete event dynamic systems (DEDS). Models can 
be formulated as a set of automata with synchronous communication 
or as Petri nets. Analysis takes place with a strong emphasis on state 
based analysis methods using Kronecker representations and ordered na- 
tural decision diagrams. Independent tools provide access to orthogonal 
techniques from different fields including computation of bisimulation 
equivalences, modelchecking, numerical analysis of Markov chains, and 
simulation. Two file formats are defined to provide a simple exchange 
mechanism between independent tools which allows to build various com- 
binations of tools. 



1 Introduction 

Tools for the specification and analysis of DEDS exist in a rich variety and 
show a certain combinatorial explosion from the set of modeling formalisms and 
the set of analysis techniques. Selection of the “best” modeling formalism is 
a highly emotional topic, but for selection of analysis techniques criteria boil 
down to availability of implementations and applicability for a given model. We 
observed severe difficulties in exchanging models between different tools, such 
that for our toolbox a strong emphasis is on a simple exchange of information 
between the independent tools it contains. Fig. 1 gives an overview: the tools are 
arranged around two file formats by which models can be specified as a set of 
automata with synchronous communication and as a hierarchical, colored Petri 
net. The Petri net formalism - named abstract Petri net notation (APNN) [2] 
- integrates several kinds of Petri nets, including place/transition nets, colored 
Petri nets, timed nets with a stochastic timing, hierarchical nets using place 
and/or transition refinement, superposed nets based on transition fusion. 

Networks of automata with synchronous interaction are specified in a diffe- 
rent format, using state transition matrices and synchronization via equal labels. 
This format directly corresponds to a Kronecker representation of the state tran- 
sition matrix of the composed model. The representation is compositional and 
uses a Kronecker product to express synchronization and a Kronecker sum for 
independent state transitions. The state space of the composed model can be re- 
presented in a (usually) very space efficient manner by a directed acyclic graph, 
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a generalization of ordered binary decision diagrams (OBDDs). The latter al- 
lows exchange of state space descriptions among tools with low effort [8,14]. In 
the sequel we briefly sketch ways to use the APNN toolbox for modeling and 
analysis, for details we refer to [1]. 



2 Several Ways to Obtain a Model 

Models can be either generated by a the grapical user interface (GUI) contained 
in the toolbox or translated from other modeling formalisms supported by other 
tools. A generation by hand, directly editing the textual description at APNN 
level or the matrix representation at the automata level is possible in principle 
but not recommended. APNNed [9] provides a GUI for the APNN format. It is 
a JAVA implementation of a Petri net editor supporting hierarchical nets (ba- 
sed on refinement of places and transitions) and colored nets with finite color 
sets. APNNed animates the dynamic behavior of a model by the token game 
in three ways: a) interactive, b) automatic selection of transitions to Are and c) 
trace-driven by importing a trace generated elsewhere. APNNed also provides 
functionality to start various analysis tools and to present their results. The ana- 
lysis tools export results in a specific output format which can be used by the 
GUI or report generators for result presentation. The toolbox also provides trans- 
formers which translate other formats into APNN. These include a translator 
for generalized stochastic Petri nets (GSPNs) specified by Great SPN to APNN, 
a translator for PEP low level nets (which are Place /Trans it ion nets) to APNN 
and vice versa, and a transformer for Net/Gondition Event systems (NGES) 







A Toolbox for the Analysis of Discrete Event Dynamic Systems 485 



to APNN. The latter is a non-trivial mapping [10,13] since certain features of 
NCES have no direct correspondence in the Petri net formalism. A model can 
also be described at the level of networks of synchronized automata. However, 
so far no direct user interface is available at this level. A model for a network of 
automata is automatically generated from a Petri net model in APNN format 
by the state space exploration tool, cf. Fig. 1. This tool does not necessarily 
perform an exploration of the overall state space, it is also used to do an explo- 
ration by components which maps a set of submodels of a Petri net to a set of 
automata (provided the Petri net is appropriately structured). Nevertheless, the 
interface format can be also used to obtain suitable networks of automata from 
other compositional formalisms, e.g. from CCS-like process algebraic terms, if 
they are in the form of (/^i|p2| • • • with processes/ agents /^i, . . . , and 

a set of synchronization labels L, however a corresponding tool is currently not 
available in our toolbox. 



3 Several Ways to Analyze a Model 

The APNN toolbox provides tools for functional and quantitative analysis which 
apply either at net level or automata level. Only a minority is devoted to Petri 
nets at the APNN level including computation of invariants and a simulator for 
quantitative analysis of timed nets. A state space exploration tool transforms 
an APNN description into the format of the automata level with optional ex- 
ploration of the overal state space of the composed model as in [12]. A strong 
emphasis is on tools which exploit the Kronecker structure implicitly given at the 
automata level. At this level a tool for computation of several equivalences of bi- 
simulation type and subsequent aggregation of automata is available. Especially 
a weak backword bisimulation preserving reachability is useful in combination 
with state space exploration of composed automata since a disaggregation mo- 
dule can finally retranslate the resulting state space of an aggregated system into 
the state space of the original system [7,8]. A generalization of ordered binary 
decision diagrams (OBDDs) where nodes are allowed to have a variable number 
of outgoing arcs (ONDDs) has been successfully applied to represent extremely 
large states space in space efficient way [8,7,14], such that a corresponding file 
format allows to communicate state spaces between tools. Furthermore a model 
checker for computational tree logic (CTL) is available at this level [14] imple- 
menting classical model checking algorithms adapted to Kronecker algebra. An 
additional specialized model checker is exclusively devoted to check the liveness 
property in terms of Petri net theory. In case of (partial) deadlocks it generates 
a trace of transition firings which can be animated by APNNed. The APNN 
toolbox also contains a variety of tools for quantitative analysis of stochastic 
models based on the numerical solution of Markov chains which again profit 
from the Kronecker structure available at the automata level. These tools can 
be further distinguished according to hierarchical, block-structured Kronecker 
representations or modular Kronecker representations, see e.g. [5,3,4,6,11] for 
corresponding algorithms. A compositional representation based on Kronecker 
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algebra is a key advantage of the analysis tools in our toolbox, since this gives a 

very space efficient data structure for large state transition systems with possibly 

millions of states. 
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Abstract. In this short paper we briefly describe a tool which is based on a 
Markovian stochastic process algebra. The tool offers both model specification 
and quantitative model analysis in a compositional fashion, wrapped in a user- 
friendly graphical front-end. 



1 Compositional Performance Modelling 

Classical process algebras have been designed as compositional description formalisms 
for concurrent systems. In stochastic process algebras temporal information is attached 
to actions in the form of continuous random variables representing activity durations, 
making it possible to specify and analyse both qualitative and quantitative properties. 
This short paper is about the TIPPtool [5], atool that emerged from the TIPP project which 
focussed on a basic framework supporting both functional specification and performance 
evaluation in a single, process algebraic formalism [6]. The formalism is basically a 
superset of LOTOS [1], including means to specify exponentially distributed delays. It 
hence provides a bridge between qualitative and quantitative evaluation, the latter based 
on Markov chain analysis. More precisely, the underlying semantics of the specification 
language gives rise to homogeneous continuous time (semi-)Markov chains that can 
be analysed numerically by means of efficient techniques. Besides some support for 
analysis of functional aspects, the tool offers algorithms for numerical performance 
analysis of a given process algebraic specification. Exact and approximate evaluation 
techniques are provided to calculate various measures of interest. The tool also offers 
semi-automatic compositional minimisation of complex models based on equivalence- 
preserving transformations. 

2 Model Specification and Analysis 

The specification language of the TIPPtool is a superset of LOTOS^ In particular, 
a distinguished type of prefix, (a,r); P, is supported, denoting that action a oc- 
curs after a delay A which is exponentially distributed with rate parameter r (i.e. 
Prob{A <t) = l — afterwards the process behaves as P. 

Actions arising from ordinary prefix a; P are called immediate actions. They happen 
as soon as possible if not prevented by the environment, following the maximal progress 
assumption. In particular, internal (or hidden) immediate actions are assumed to happen 

* Current affiliation: Systems Validation Centre, University of Twente, Enschede, The Nether- 
lands 

** Current affiliation: Development Access, Lucent Technologies, Nurnberg, Germany 
^ Data types are treated more liberally than in standard LOTOS; integers are a built-in data type. 
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immediately when enabled. In addition to the basic language elements, process instan- 
tiation, parametric processes and inter-process communication can be used to model 
complex dependences, such as value passing or mobility. 

Conservatively extending classical process algebras, a labelled transition system 
(LTS) is generated from the system specification using structural operational rules [6]. 
Corresponding to timed and immediate actions there are two types of transitions between 
states: timed transitions and immediate transitions. The LTS can hence be regarded 
as a semi-Markov process. Under certain conditions (checked by the tool) the semi- 
Markov process can be transformed into a continuous time Markov chain. Verifying 
these properties involves equivalence preserving transformations, based on a stochastic 
variant of Milner’s observational congruence [6]. Since this relation is compositional, 
it can be applied to minimise the state space of a specification in a componentwise 
fashion. This minimisation abstracts from internal immediate steps and it aggregates the 
Markov chain based on the concept of lumpability [10], while preserving functional and 
stochastic information. For a particular Markov chain, a system of ordinary differential 
equations needs to be solved in order to obtain the state probabilities at a particular 
time instant t (transient analysis). Alternatively, solving a linear system of equations 
leads to the state probabilities in the equilibrium (stationary analysis). These limiting 
probabilities (where t oo) are known to exist for arbitrary finite (homogeneous, 
continuous time) Markov chains. 

3 Tool Features and Structure 

In its current version 2.3, the TIPPtool provides the following functionality: 

- Model description by means of a LOTOS-based notation, 

- Reachability analysis based on the operational semantics, 

- Algorithms for deadlock detection and tracing to a given state, 

- Algorithms for checking bisimulation- style equivalences and for the minimisation 
of (sub-)models, 

- Stationary and transient analysis of the underlying Markov chain, 

- Functions for the calculation of performance and dependability measures, 

- Support of experiment series, 

- Output of numerical results using the tool pxgraph, 

- Interfacing with other tools. 

The tool consists of several components whose in- 
teraction is shown in the figure on the right. Speci- 
fications can be created with an editor (Edit com- 
ponent). The Generate/Aggregate component is 
responsible for parsing the specification, for the 
generation of the LTS and for its minimisation ac- 
cording to an equivalence notion. The user may 
currently choose between four (stochastic variants 
of) classical congruences. This minimisation is known to be particularly beneficial if it 
is applied to components of a larger specification in a stepwise, compositional fashion. 
In the TIPPtool, semi-automatic compositional minimisation is supported in an elegant 
way: By highlighting a certain fragment of the specification with the mouse, it is possible 
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to invoke compositional minimisation of that fragment. When the minimised represen- 
tation is computed, a new specification is generated automatically, where the selected 
fragment has been replaced by its minimised representation. 

Via the Options, the user can specify various measures to be calculated, such as 
the probability of the system being in a certain subset of states, or the throughput (i.e. 
the mean frequency of occurrence) of some action. An experiment description contains 
information about model parameters to be varied during analysis. A series of experiments 
can be carried out automatically in an efficient manner, generating numerical results for 
different values of a cer- 
tain model parameter, while 
the state space only needs 
to be generated once. Mo- 
dels can be analysed with 
the Analyse module. This 
module offers various nu- 
merical solution algorithms 
for the underlying stochastic 
process, among them two ap- 
proximate methods [9,12]. 

After an experiment series 
has been carried out, the re- 
sults are presented graphi- 
cally with the tool pxgraph 
from UC Berkeley, cf. the 
screenshot on the right. The 
Export module of the tool provides interfaces to three other tools, pepp [4], topo [11], 
and CADP [2]. The former interface generates stochastic task graphs [8], for which the 
tool PEPP offers a wide range of both exact and approximate analysis algorithms, some of 
which work even for general distributions. The second interface provides support for the 
translation of specifications into a format suitable for the LOTOS tool topo. Among other 
functionalities, topo is capable of building C-programs from LOTOS specifications. The 
third interface can be used to exploit the bisimulation equivalence algorithms of the tool 
ALDEBARAN, as Well as Other tools (ec2, autograph), for visualisation or functional 
verification purposes. Here, the interface is on the level of the state space. 

We used the programming language Standard ML for implementing the parser, the 
semantics, the bisimulation algorithms and for the approximate Markov chain solution 
methods. The numerical analysis part is written in C, on top of a library which provides 
data structures for sparse matrices (SparseLibl.3 from Kenneth Kundert, UC Berkeley). 
This library has been extended by iterative solution methods for stationary and transient 
analysis. The clear interface of the library makes it easy to integrate other solution 
methods into the tool. The communication with the state space generator is done via 
ASCII-files. For computing the measures, shell-scripts are used, which are based on 
standard UNIX-tools such as grep, awk and sed. Finally, the graphical user interface has 
been implemented using the scripting language Tcl/Tk. The communication between 
the GUI and the other tools is done via UNIX-pipes. 
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4 Conclusion 

In this short paper, we have presented the status quo of the TlPPtool. We have described 
the particular features of a stochastic process algebra based specification formalism, to- 
gether with the distinguishing components of the tool. To the best of our knowledge, the 
TlPPtool is the only existing tool offering compositional minimisation of Markov chain 
models. TlPPtool is available free of charge for non-commercial institutions, more de- 
tails can be found at http : //www7 . inf ormatik. uni -erlangen. de/tipp/. Among 
others, the tool has been applied to the study of performance and dependability aspects of 
the plain old telephony system [7], a robot control system [3], and a hospital information 
system [13]. So far, models with up to 10^ states have been tackled compositionally. 
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1 Motivation 

Verification plays a central role in the security of Java bytecode: the Java byte- 
code verifier performs a static analysis to ensure that bytecode loaded over a 
network has certain security related properties. When this is the case, the byte- 
code can be efficiently interpreted without runtime security checks. 

Our research concerns the theoretical foundations of bytecode verification 
and alternative approaches to specifying and checking security properties. This is 
important as currently the “security policy” for Java bytecode is given informally 
by a natural language document [LY96] and the bytecode verifier itself is a closed 
system (part of the Java virtual machine). We believe that there are advantages 
to more formal approaches to security. A formal approach can disambiguate 
the current policy and provide a basis for verification tools. It can also help 
expose bugs or weaknesses that can corrupt Java security [MF97]. Moreover, 
when the formal specification is realized in a logic and verification is based on a 
theorem prover, extensions become possible such as integrating the verification 
of security properties with other kinds of verification, e.g., proof-carrying code 
[NL96,NL98]. 

2 Approach 

We provide a formal foundation to bytecode verification based on model check- 
ing. The idea, which has similarities with data flow analysis and abstract inter- 
pretation [Sch98], is as follows. The bytecode for a Java method M constitutes 
a state transition system where the states are defined by the states of the Java 
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Virtual Machine (JVM) running M, and the transitions are given by the seman- 
tics of the JVM instructions used in M . From M we can compute an abstraction 
that abstracts the state-transition system to a simpler one whose states are 
defined by the values of the JVM’s program counter, the operand stack, a stack 
pointer, and the method’s local variables. The actual values of the stack positi- 
ons and local variables are abstracted away and simply represented by their type 
information. The transition rules of are defined likewise by the semantics 

of the JVM machine instructions with respect to our abstraction. Since only 
finitely many types appears in each method, the resulting abstraction Mfin is 
finite; the size of the state-space is exponential in the number of local variables 
and the maximal stack height. 

After we can apply a model checker to The properties that we model 

check correspond to the type safety checks performed by the Java bytecode 
verifier. For example, we specify that each transition in AJy^^ that represents a 
machine instruction in M finds appropriately typed data in the locations (stack 
or local variables) it uses. The model checker then either responds that the byte 
code is secure (with respect to these properties) or provides a counter-example 
to its security. 

3 Architectural Description 

The overall structure of our system is depicted in Figure 1. As input it takes 
a Java class file as well as a specification of an abstraction of the Java virtual 
machine. The specification defines the states of the abstract machine and how 
each bytecode instruction changes 
the machine’s state. For each instruc- 
tion, a precondition to its execution 
is given (e.g. that the operand-stack 
must contain enough operands of ap- 
propriate type) and also invariants 
are stated (e.g. that the stack may 
not exceed its maximal size). These 
are the properties that are model 
checked. 

T he core rout ine (met hod ab- 
straction) translates bytecode into a 
finite state transition system using 
the specification of the abstract ma- 
chine. Separating the machine speci- 
fication from the translation gives us a modular system where we can easily 
change the virtual machine and the properties checked. Our system is also mo- 
dular with respect to the model checker used. Currently we have implemented 
two different back-ends: one that compiles the transition system and properties 
to the input language of the SMV model checker and a second that generates 
output in the SPIN language Promela. 




Fig. 1. Structure of the compiler 
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4 Example Output 

As a simple example (even here we must elide details) we give (a) a Java program, 
(b) the corresponding bytecode, and (c) the output of our system, which is input 
for the SPIN model checker. 



public static int fac(int a){ 
if (a==0) 
return 1 ; 
else 

return a*fac(a-l);} 

(a) Java Code 



.method public static fac(I)I 
.limit stack 3 
.limit locals 1 
.line 8 

iload_0 
ifne Label 1 
.line 9 

iconst_l 
ire turn 
.line 11 
Label 1: 

iload_0 

iload_0 

iconst_l 

isub 

invokestatic Sample/fac(I)I 
imul 
ire turn 
.end method 



#define pc_is_l (pc == 1) 

#define pc_is_2 (pc == 2) 

/* Conditions to be checked */ 

#define cond_l (locals [0] == INT) 
#define cond_2 (st [stp_st - 1] == INT) 

[...] 



/* State of the 


abstract machine 


*/ 


byte 


pc; 


/* program counter 


*/ 


byte 


St [3] ; 


/* operand stack 


*/ 


byte 


stp_st ; 


/* stack pointer 


*/ 


byte 


locals [1] 


/* local variables 


*/ 



/* Process that watches if the conditions hold */ 
proctype asrt_fac() { 

assert( ( !pc_is_l || cond_l) && [...])} 

/* Process that models the transition system */ 
proctype meth_fac() { 
do 

/* iload_0 */ 

: : pc_is_l -> atomic { 

pc = pc + 1 ; 
st[stp_st] = locals [0]; 
stp_st = stp_st + 1 }; 

/* ifne Labell */ 

: : pc_is_2 -> atomic { 
if 

: : pc = pc + 5; 

: : pc = pc + 3 
fi; 

stp_st = stp_st - 1 }; 



[...] 

od } 

/* Initialization of the abstract machine */ 
init -[ 
atomic -[ 

pc = 1; stp_st = 0; locals [0] = INT; 
run meth_fac() ; run asrt_fac() }■ }■ 



(b) Bytecode 



(c) and Properties 



The Java program and the bytecode should be clear. We have added by hand 
some comments to (c). In the process meth_fac, the transitions of the method 
fac are modeled. For example, the first instruction of the method iloadO loads 
an integer value from a local variable on the stack; the corresponding condition 
to be checked (cond_l) requires that the respective variable contains an integer 
value. The instruction ifne performs a conditional branch, which is modeled by 
nondeterministically assigning a new value to the program counter. The process 
asrt-fac runs in parallel to the process meth_fac and checks if all conditions 
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(preconditions and invariants) are fulfilled. SPIN checks this in negligible time 
(0.03 seconds). 

5 Future Work 

We have completed Version 1 of the system. This formalizes and model-checks the 
JavaCard subset of Java, which is used for smartcards [SunOS]. We have chosen 
this particular instance of Java for three reasons: first, JavaCard does not allow 
for dynamic class loading, therefore there are no “real-time” requirements for 
bytecode verification. Second, the bytecode verifier for JavaCard lives outside 
the client platform, so it can easily be replaced/extended without modifying the 
platform itself. Finally, our aproach can contribute to meeting the high security 
requirements that smartcard applications usually have. 

In a future release we plan to extend this version to the full JVM instruction 
set. The only significant problems that might occur are run time requirements for 
the model checker (defined by the time a user is willing to wait when loading a 
class) and multi-threading, which is not possible in JavaCard and could increase 
the model checker’s search space. 
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1 Introduction 

This paper describes NuSMV, a new symbolic model checker developed as a joint proj- 
ect between Carnegie Mellon University (CMU) and Istituto per la Ricerca Scientifica 
e Tecnolgica (IRST). NuSMV is designed to be a well structured, open, flexible and 
documented platform for model checking. In order to make NuSMV applicable in tech- 
nology transfer projects, it was designed to be very robust, close to the standards required 
by industry, and to allow for expressive specification languages. 

NuSMV is the result of the reengineering, reimplementation and extension of SMV 
[6], version 2.4.4 (SMV from now on). With respect to SMV, NuSMV has been exten- 
ded and upgraded along three dimensions. First, from the point of view of the system 
functionalities, NuSMV features a textual interaction shell and a graphical interface, 
extended model partitioning techniques, and allows for LTL model checking. Second, 
the system architecture of NuSMV has been designed to be highly modular and open. 
The interdependencies between different modules have been separated, and an external, 
state of the art BDD package [8] has been integrated in the system kernel. Third, the qua- 
lity of the implementation has been strongly enhanced. This makes of NuSMV a robust, 
maintainable and well documented system, with a relatively easy to modify source code. 
NuSMV is available at http : / / af rodite .itc.it: 1024/'"nusmv/. 

2 System Functionalities 

NuSMV can process files written in SMV language [6], and allows for the construction 
of the model with different modalities, reachability analysis, fair CTL model checking, 
computation of quantitative characteristics of the model, and generation of counterex- 
amples. In addition, NuSMV features an enhanced partitioning method for synchronous 
models based on [7], and allows for disjunctive partitioning of asynchronous models, 
and for the verification of invariant properties in combination with reachability analysis. 
Furthermore, NuSMV supports LTL model checking. The algorithm is based on the 
combination of a tableau constructor for the LTL formula with standard CTL model 
checking, along the lines described in [5]. 

NuSMV can work in batch mode, just like SMV, processing an input file according 
to the specified command line options. In addition, NuSMV has an interactive mode: it 
enters a shell performing a read-eval-print loop, and the user can activate the various com- 
putation steps (e.g. parsing, model construction, reachability analysis, model checking) 
as system commands with different options. (This interaction mode is largely inspired 
by the VIS interaction mode [2].) These steps can therefore be invoked separately, pos- 
sibly undone or repeated under different modalities. Each command is associated with 
an on-line help. Furthermore, the internal parameters of the system can be inspected 
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Fig. 1. A snapshot of the NuSMV GUI. 



and modified to tune the verification process. For instance, the NuSMV interactive shell 
provides full access to the configuration options of the underlying BDD package. Thus, it 
is possible to investigate the effect of different choices (e.g. whether and how to partition 
the model, the impact of different cache configurations) on the verification process. For 
instance, it is possible to control the application of BDD variable orderings in a particular 
phase of the verification (e.g. after the model is built). 

On top of the interactive shell, a graphical user interface (GUI from now on) has been 
developed (Figure 1). The GUI provides an integrated environment to edit and verify the 
file containing the model description. It provides graphical access to all the commands 
interpreted by the textual shell of NuSMV, and allows for the modification of the options 
in a menu driven way. Moreover, the GUI offers a formula editor which helps the user 
in writing new specifications. Depending on the kind of formula being edited (e.g. 
propositional, CTL, LTL), various buttons corresponding to modalities and/or boolean 
connectors are activated and deactivated. 

3 System Architecture 

Model checking is often referred to as “push-button” technology. However, it is very 
important to be able to customize the model checker according to the system being 
verified. This is particularly true in technology transfer, when the model checker may 
act as the kernel for a custom verification tool, to be used for a very specific class of 
applications. This may require the development of a translator or a compiler for a (pos- 
sibly proprietary) specification language, and the effective integration of decomposition 
techniques to tackle the state explosion. 

NuSMV has been explicitly designed to be an open system, which can be easily mo- 
dified, customized or extended. The system architecture of NuSMV has been structured 
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Fig. 2. The NuSMV system architecture. 



and organized in modules. Each module implements a set of functionalities and commu- 
nicates with the Others via a precisely defined interface. A clear distinction between the 
system back-end and front-end has been enforced, in order to make it possible to reuse 
the internal components independently of the input language being used to describe the 
model. 

The architecture of NuSMV (see Figure 2) is composed of the following modules: 

Kernel. The kernel provides the low level functionalities such as dynamic memory 
allocation, and manipulation of basic data structures (e.g. cons cells, hash tables). The 
kernel also provides all the basic BDD primitives, directly taken from the CUDD [8] 
BDD package. The integration of the CUDD package hides the details of the garbage 
collection. The NuSMV kernel can be used as a black box, following coding standards 
which have been precisely defined. 

Parser. This module implements the routines to process a file written in NuSMV 
language, check its syntactic correctness, and build a parse tree representing the internal 
format of the input file. 

Compiler. This module is responsible for the compilation of the parsed model into 
BDDs. The Instantiation submodule processes the parse tree, and performs the instantia- 
tion of the declared modules, building a description of the finite state machine (FSM) 
representing the model. The Encoding submodule performs the encoding of data types 
and finite ranges into boolean domains. Having separated this module makes it possible 
to have different encoding policies which can be more appropriate for different kind of 
variables (e.g. data path, control path). The FSM Compiler submodule provides the rou- 
tines for constructing and manipulating FSM’s at the BDD level. It is responsible of all the 
necessary semantic checks on the read model, such as the absence of circular definitions. 
The FSM’s can be represented in monolithic or partitioned form [3]. The heuristics used 
to perform the conjunctive partitioning of the transition relation and reordering of the 
clusters [7] have been developed to work at the BDD level, independently of the input 
language. The interface to other modules is given by the primitives for the computation 
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of the image and counter-image of a set of states. These primitives are independent of 
the method used to represent the transition relation. 

Model Checking. This module provides the functionalities for reachability, fair CTL 
model checking, invariant checking, and computation of quantitative characteristics. 
Moreover, this module provides the routines for counterexample generation and inspec- 
tion. Counterexamples can be produced with different levels of verbosity, in the form 
of reusable data structures, and can subsequently be inspected and navigated. All these 
routines are independent of the particular method used to represent the FSM. 

LTL. The LTL module is a separated module which calls an external program that 
translates the LTL formula into a tableau suitable to be loaded into NuSMV. This 
program also generates a new CTL formula to be verified on the synchronous product 
of the original system and the generated tableau. 

Interactive shell From the interaction shell the user has full access to all the func- 
tionalities provided by the system. 

Graphical user interface. The graphical user interface has been designed on top of 
the interactive shell. It allows the user to inspect and set the value of the environment 
variables of the system, and provides full access to all the functionalities. 

4 Implementation 

NuSMV has been designed to be robust, close to the standards required by industry and 
easy to maintain and modify. NuSMV is written in ANSI C and is POSIX compliant. 
This makes the system portable to any compliant platform. It has been throughly debug- 
ged with Purify (http : / / www . pur e atria . com) to detect memory leaks and runtime 
memory corruptions errors. 

The kernel of NuSMV provides low level functionalities, such as dynamic memory 
allocation, in a way independent from the underlying operating system. Moreover, it 
provides routines for the manipulation of basic data structures such as cons cells, hash 
tables, arrays of generic types, and encapsulates the CUDD BDD package [8]. 

In order to implement the architecture depicted in Section 3, the source code of Nu- 
SMV has been organized in different packages. NuSMV is composed of 11 packages. 
Each package exports a set of routines which manipulate the data structures defined 
in the package and which allow to modify the options associated to the functionalities 
provided by the package itself. Moreover, each package is associated with a set of 
commands which can be interpreted by the NuSMV interactive shell. We have packages 
for model checking, FSM compilation, BDD interface, LTL model checking and kernel 
functionalities. New packages can be added relatively easily, following precisely defined 
rules. 

The GUI has been developed in Tcl/Tk. It runs as a separate process, synchronously 
communicating with NuSMV by issuing textual commands to the interactive shell, and 
processing the resulting output to display it graphically. 

The code of NuSMV has been documented following the standards of the ext tool 
(http : //alumnus . caltech. edu/'^sedwards/ext), which allows for the automatic 
extraction of the programmer manual from the comments in the system source code. 
The programmer manual is available in txt or html format, and can be browsed by an 
HTML viewer. This tool is also used to generate the help on line available through the 
interactive shell and via the graphical user interface. 
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The user manual has been written following the texinfo standard, from which 
different formats (i.e. postscript, pdf, dvi, info, htmf) can be automatically generated, 
and accessed via an htmf viewer or in hardcopy. 

5 Results and Future Directions 

NuSMV is a robust, well structured and flexible platform, designed to be applicable in 
technology transfer projects. The performance of NuSMV have been compared with 
those of SMV by running a number of SMV examples. Despite the fact that NuSMV 
gives up some of the optimizations of SMV to simplify the dependencies between 
modules, an improvement in computation time has been obtained. In most examples 
NuSMV performs better than SMV, in particular for larger examples. This enhancement 
in performance is mainly due to the use of CUDD BDD package. 

The NuSMV architecture provides a precise distinction between the front-end, spe- 
cific to the SMV input language, and the back-end (including the heuristics for model 
partitioning and model checking algorithms), which is independent of the input langu- 
age. This separation has been used to develop on top of NuSMV the MBP system. MBP 
is a planner able to synthesize reactive controllers for achieving goals in nondeterministic 
domains [4]. 

Functionalities currently under development are a simulator, which is of paramount 
importance for the user to acquire confidence in the correctness of the model, and a com- 
piler for an imperative style input language, which can often be very convenient in the 
modeling process. Further developments will include the integration of decomposition 
techniques (e.g. abstraction and compositional verification), and new and very promi- 
sing techniques based on the use of efficient procedures for propositional satisfiability, 
following the ideas reported in [1]. 
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1 Introduction 

Authentication protocols are used in distributed environments ensure the iden- 
tity of the communication partners and to establish a secure communication 
between them. With the widespread use of distributed computing (e.g., Internet, 
electronic commerce), authentication protocols have gained much importance. 
Because high values can be at stake, such protocols must have extremely high 
quality and must be resistant with respect to intruders. Therefore, usually formal 
methods are used for their design and verification. In the literature, a variety 
of different methods and techniques for protocol analysis have been developed 
(cf. [Mea94] for an overview). Typically, the methods exhibit their strength in 
different stages of the development of an authentication protocol: in early design 
stages, conformance to a development standard [AG98] and absence of major de- 
ficiencies of a protocol can be ensured by type checking. As a next step, modal 
logics of belief are used to model a protocol and its properties. Such logics (e.g., 
BAN [BAN89], SVO, GNY, or AUTLOG [KW94]) are convenient for the verifi- 
cation of important properties, but are relatively weak with respect to modeling 
intricate intruder scenarios. Here, model-checking approaches (e.g., [KW96]) can 
be used. They can efficiently and automatically analyze a protocol. However, 
they usually cannot provide a positive proof and are limited by the size of the 
state-space they can explore. Methods for verification which are based on GSP, 
like [Pau97], avoid this problem by simultaneously modeling a potentially infinite 
number of interleaving protocol runs, but their degree of automatic processing 
(e.g., with Isabelle) is still rather small. 

The tool PIL/Setheo addresses the second stage: PIL/Setheo is capable 
of automatically proving safety properties of authentication protocols, formalized 
in the modal belief logics BAN [BAN89] and AUTLOG [KW94]. PIL/Setheo 
is based Setheo, an automated theorem prover for first order predicate logic. 

2 Requirements and System Architecture 

PIL/Setheo was designed with the goal of practical usability. Therefore, the 
following important requirements are the basis for PIL/ Setheo ’s system design: 
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— automatic processing: after specifying the protocol and the desired properties 
the tool should run automatically. Response-times are to be kept below one 
minute. 

— representation level: the protocol and its properties are specified in the modal 
BAN or AUTLOG logic. The transformation into first-order logic must be 
kept transparent to the user. Thus, no knowledge about first-order theorem 
proving or Setheo should be required to use PIL/Setheo. 

— human readable proofs: a major benefit of protocol analysis with modal belief 
logics is that the resulting proofs are relatively short and provide valuable 
insights to the protocol designer. This is in sharp contrast to model checking 
techniques (where no proof is provided) and CSP-based techniques which 
produce rather lengthy and complex proofs. Hence, all proofs are to be pre- 
sented on the level of the source logic (BAN or AUTLOG) and must be 
human-readable. 

— feedback on failed conjectures: during development of a protocol, it is likely 
that some of the conjectures cannot be proven due to errors in design or 
formalization. Then, a simple answer “no” (or an endless loop) is rather 
insufficient. Thus PIL/Setheo has to offer several ways to provide feed- 
back on what might be wrong in case a proof attempt fails. 

These requirements are reflected in PIL/Setheo’s system architecture. Its 
input is a specification of the protocol’s messages, additional assumptions, and 
the theorems to be proven. The specification language developed for PIE/ Setheo 
[Wag97] is close to the underlying modal logic (BAN or AUTLOG). An example 
for a simple protocol (a variant of the RPG- hands hake) is shown in Figure lA. 

This input specification in translated into one or more proof tasks in first- 
order logic (in clausal normal form). PIL/Setheo uses the approach of meta- 
interpretation which transforms each BAN (or AUTLOG) formula into a term. A 
newly introduced predicate symbol holds (abbreviated as h) is true, if and only 
if its argument (a translated modal formula) can be derived using the inference 
rules of the resp. modal logic. Thus, all inference rules of the BAN (or AUTLOG) 
logic are transformed into first-order implications. For details see [Sch97]. 

These proof tasks form the input of Setheo, a high performance theorem 
prover for first-order logic in clausal normal form [Let92]. Setheo features a 
wide variety of techniques for pruning the search space which is traversed in a 
depth-first manner with iterative deepening. When Setheo finds a proof, a tree- 
like model elimination tableau is returned. A proof is this form, however, is not 
readable. Therefore, it is automatically translated into a human-readable form 
using the tool ILF-Setheo [WS97]. After a transformation into a sequent-style 
calculus (block calculus), the proof is syntactically converted into a proof of the 
original BAN (or AUTLOG) logic and type-set using A short example 

of the output is shown in Figure IB. This representation of the proof directly 
corresponds to the representation level of the input of PIL/Setheo (left side of 
Figure). For details on the notation cf. [BAN89,Sch97]. 

In case, a conjecture cannot be proven, Setheo usually reaches a run- 
time limit. In order to increase usability of the tool, PIL/Setheo features 
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A 


B 




Theorem 1. conjecture. 


Objects : 

principal A,B; 


Proof. We show directly that 


sharedkey K_a_b , Kp_a_b ; 
statement N_a, N_b; 


conjecture. 


Assumptions : 


(1) 


A believes sharedkey K_a_b; 
B believes sharedkey K_a_b; 
A believes B controls 


Because of Message- Meaning^ Assumptions^ 
and by Messages 


sharedkey K_a_b; 


B believes sharedkey Kp_a_b; 
A believes fresh N_a; 


\- B ^ A\^ Nb. 


B believes fresh N_b; 


(2) 


Idealized Protocol: 


Because of Theorem 


message 1: A -> B {N_a} (K_a_b) ; 
message 2: A <- B 

{f (N_a) ,N_b}(K_a_b) ; 


conjecture ^ h B^ A^ N b- 


message 3: A -> B {N_b} (K_a_b) ; 


(^) 


message 4: A <- B 


Because of Nonce- Verification: VP, Q, VP : 


{sharedkey Kp_a_b}(K_a_b) ; 
Conjectures: after message 4: 

B believes A believes N_b; 


P|= Q\= R ^ P\= Q|- PA 

P|= #P. Hence by (2) and by Assumption^ 
-1 conjecture. Hence by (3) conjecture. Thus 
we have completed the proof of (1). q.e.d. 



Fig. 1. Example input (A) and output of PIL/Setheo (B). 



two ways of producing feed-back: belief- generation and abduction. In the first 
case, PIL/Setheo generates all beliefs which are derivable from the given spe- 
cification and which conform to given syntactic criteria. Let us assume that 
we had “forgotten” the last assumption (B believes fresh N_b, As sumptions) 
in Figure lA. Then, our theorem cannot be proven. In that case, the user 
can ask PIL/Setheo which kinds of BAN-formulas B believes. PIL/Setheo, 
which uses a variant of the DELTA-preprocessor [Sch94] to generate the formu- 
las in a bottom-up way, returns a list of BAN-formulas (in our example 124). 
PIL/Setheo’s user interface allows to further restrict the focus of the formulas 
by specifying a syntactic filter. For example, we might ask what B believes to be 
fresh (freshness is an important issue in protocol analysis with BAN-logic). Now, 
PIL/Setheo returns a much shorter list of formulas (8 in our case). From them, 
it is quite obvious that there are no terms which contain any reference to fres- 
hness of time-stamp This is a clear indication that something is wrong with 
that time-stamp: B does not belief the validity of its own time-stamps. This 
immediately leads to the missing assumption B |= (B believes fresh 

N_b) which then yields the desired proof. 

In the abduct ive mode, additional assumptions (or patterns, like B believes 
the freshness of each time-stamp) can be given by the user. PIL/Setheo then 
tries to prove the theorem and returns a list of (most specific) instantiations 
of the additional assumptions which were required to find a proof with given 
resources. From there, the user can find out those assumptions which might be 
important for the analysis. 
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The user interface for PIL/Setheo is straight forward and easy to use. 
PIL/Setheo uses the tool “make” to make sure that for a complete analysis all 
conjectures have been proven. Upon completion, PIL/Setheo returns a UljvjX- 
document containing a full report and all proofs. 

3 Conclusions 

We have used PIL/Setheo to analyze a number of well-known protocols (Ker- 
beros, Andrew Secure RPC Handshake, Needham Schroeder, Needham Schroe- 
der with pubic keys, Otway Rees, wide-mouthed frog, Yahalom, CCITT-X.509, 
ISO10181 and others). All proof tasks arising from the verification of these pro- 
tocols (with BAN or AUTLOG) could be shown fully automatically within less 
than one minute per protocol (actual proof times have been below 20 seconds). 
As far as possible with the formalism of belief logics, we were able to “re-detect” 
errors in early versions of the protocols. With its fully automatic operation and 
its capability to generate human-readable proofs in the BAN or AUTLOG lo- 
gic PIL/Setheo is a powerful, yet easy to use tool, especially suited for early 
protocol design phases. 
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